Cosmos-Transfer2.5 image inference runs our model on single frames or on control videos that use an image as a style reference. This guide covers the setup prerequisites, the image-to-image and style-reference workflow examples, relevant JSON parameters, and torchrun commands for multi-GPU scaling.
- Follow the Setup guide for environment setup, checkpoint download and hardware requirements.
Transform a single image or video frame using control signals and text prompts:
python examples/inference.py -i assets/image_example/image2image.json -o outputs/image2imageFor more detailed guidance and example about image prompting, checkout our Cosmos Cookbook Style-Guided Inference recipe.
Use an image as a style reference to guide video generation with a particular visual aesthetic.
python examples/inference.py -i assets/image_example/image_style.json -o outputs/image_styleOr use torchrun for multi-GPU inference:
torchrun --nproc_per_node=8 --master_port=12341 examples/inference.py -i assets/image_example/image_style.json -o outputs/image_style/For an explanation of all the available parameters run:
python examples/inference.py --help
python examples/inference.py control:edge --help # for information specific to edge control{
"name": "image_style",
"prompt": "The camera moves steadily forward...",
// Input video that determines the control signals for the generation
"video_path": "calm_street.mp4",
// Reference image that determines the style of the generated video
"image_context_path": "sunset.jpg",
"seed": 1,
"edge": {}
}| Input Video | Reference Image | Output Video |
|---|---|---|
calm_street.mp4 |
|
image_style_output.mp4 |

{ "name": "image_to_image", "prompt": "A scenic drive unfolds along a coastal highway...", // The input video. We'll extract the {max_frames} frames from the video. "video_path": "coastal_highway.mp4", "max_frames": 1, // Generate only the first frame "num_video_frames_per_chunk": 1, "seed": 1, "edge": {} // Control computed on the fly }