Skip to content

Latest commit

 

History

History
96 lines (72 loc) · 2.97 KB

File metadata and controls

96 lines (72 loc) · 2.97 KB

Image Inference Guide

Cosmos-Transfer2.5 image inference runs our model on single frames or on control videos that use an image as a style reference. This guide covers the setup prerequisites, the image-to-image and style-reference workflow examples, relevant JSON parameters, and torchrun commands for multi-GPU scaling.

Prerequisites

  1. Follow the Setup guide for environment setup, checkpoint download and hardware requirements.

Image-to-Image

Transform a single image or video frame using control signals and text prompts:

python examples/inference.py -i assets/image_example/image2image.json -o outputs/image2image

Image Prompt: using an image as a style reference

For more detailed guidance and example about image prompting, checkout our Cosmos Cookbook Style-Guided Inference recipe.

Use an image as a style reference to guide video generation with a particular visual aesthetic.

python examples/inference.py -i assets/image_example/image_style.json -o outputs/image_style

Or use torchrun for multi-GPU inference:

torchrun --nproc_per_node=8 --master_port=12341 examples/inference.py -i assets/image_example/image_style.json -o outputs/image_style/

For an explanation of all the available parameters run:

python examples/inference.py --help

python examples/inference.py control:edge --help # for information specific to edge control

Configuration

Image-to-Image

{
    "name": "image_to_image",
    "prompt": "A scenic drive unfolds along a coastal highway...",

    // The input video. We'll extract the {max_frames} frames from the video.
    "video_path": "coastal_highway.mp4",
    "max_frames": 1,

    // Generate only the first frame
    "num_video_frames_per_chunk": 1,

    "seed": 1,
    "edge": {}  // Control computed on the fly
}

Image Prompt

{
    "name": "image_style",
    "prompt": "The camera moves steadily forward...",

    // Input video that determines the control signals for the generation
    "video_path": "calm_street.mp4",

    // Reference image that determines the style of the generated video
    "image_context_path": "sunset.jpg",

    "seed": 1,
    "edge": {}
}

Examples

Image Prompt

Input Video Reference Image Output Video
calm_street.mp4
Reference image
image_style_output.mp4