Wan Model Image-to-Video API Documentation

Wan/Alibaba Cloud provides high-quality image-to-video generation models. This document describes the complete API interface specification for using Wan/Alibaba Cloud models for image-to-video generation. All video generation calls use the same /v1/video/generations endpoint, with different parameters depending on the use case.

Supported Models

Currently supported models include:

Model	Description
wan2.5-i2v-preview	Wan 2.5 image-to-video generation model (preview)
wan2.6-i2v	Wan 2.6 image-to-video generation model
wan2.1-kf2v-plus	Wan 2.1 first-last frame to video generation model

Overview

The Wan model image-to-video feature provides an asynchronous task processing mechanism:

Submit Task: Send an image and text prompt to create a video generation task
Query Status: Query generation progress and status through task ID
Get Results: Retrieve the generated video file after task completion

Task Status Flow

queued → in_progress → completed
                ↓
            failed

queued: Task has been submitted and is waiting to be processed
in_progress: Task is being processed
completed: Task completed successfully, video has been generated
failed: Task failed

API List

Method	Path	Description
POST	/v1/video/generations	Submit video generation task
GET	/v1/video/generations/{task_id}	Query task status

Usage Examples

1. Basic Image-to-Video (First Frame)

The simplest form of image-to-video generation uses a single image as the first frame. The first frame is specified via the input_reference field of the request. It can be either a URL or base64-encoded data.

Note: Unlike Veo, the base64 data must be presented in data URI format, in which the encoded data is prefixed with the MIME type: data:{MIME_TYPE};base64,{base64_data}, as opposed to simply sending the base64 data. See official documentation for examples and further detail.

Request Body:

{
  "prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
  "model": "wan2.5-i2v-preview",
  "input_reference": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...",
  "metadata": {
    "input": {
      "negative_prompt": "blurry, low quality, distorted"
    },
    "parameters": {
      "resolution": "1080P",
      "duration": 5,
      "audio": true,
      "watermark": false,
      "prompt_extend": false
    }
  }
}

Or using a URL:

{
  "prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
  "model": "wan2.5-i2v-preview",
  "input_reference": "https://example.com/first-frame.png",
  "metadata": {
    "input": {
      "negative_prompt": "blurry, low quality, distorted"
    },
    "parameters": {
      "resolution": "1080P",
      "duration": 5,
      "audio": true,
      "watermark": false,
      "prompt_extend": false
    }
  }
}

Complete Request (base64):

curl -X POST "https://computevault.unodetech.xyz/v1/video/generations" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer API_KEY" \
  -d '{
    "prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
    "model": "wan2.5-i2v-preview",
    "input_reference": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...",
    "metadata": {
      "input": {
        "negative_prompt": "blurry, low quality, distorted"
      },
      "parameters": {
        "resolution": "1080P",
        "duration": 5,
        "audio": true,
        "watermark": false,
        "prompt_extend": false
      }
    }
  }'

Complete Request (URL):

curl -X POST "https://computevault.unodetech.xyz/v1/video/generations" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer API_KEY" \
  -d '{
    "prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
    "model": "wan2.5-i2v-preview",
    "input_reference": "https://example.com/first-frame.png",
    "metadata": {
      "input": {
        "negative_prompt": "blurry, low quality, distorted"
      },
      "parameters": {
        "resolution": "1080P",
        "duration": 5,
        "audio": true,
        "watermark": false,
        "prompt_extend": false
      }
    }
  }'

2. First and Last Frames

This feature currently only supports the wan2.1-kf2v-plus model. The first and last frames are specified via the metadata.input.first_frame_url and metadata.input.last_frame_url fields.

Note: Unlike the first-frame only image-to-video generation use case, these fields only accept URLs, not base64-encoded data.

Limitations: In first-and-last-frame mode, resolution is fixed at 720P, duration is fixed at 5 seconds, and audio and shot_type parameters are not available.

Request Body:

{
  "prompt": "The hand-shaped statue cracks and collapses, with pieces from above the wrist falling into the water.",
  "model": "wan2.1-kf2v-plus",
  "metadata": {
    "input": {
      "first_frame_url": "https://example.com/first-frame.png",
      "last_frame_url": "https://example.com/last-frame.png",
      "negative_prompt": "blurry, low quality, distorted"
    },
    "parameters": {
      "watermark": false,
      "prompt_extend": false,
      "seed": 12345
    }
  }
}

Complete Request:

curl -X POST "https://computevault.unodetech.xyz/v1/video/generations" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer API_KEY" \
  -d '{
    "prompt": "The hand-shaped statue cracks and collapses, with pieces from above the wrist falling into the water.",
    "model": "wan2.1-kf2v-plus",
    "metadata": {
      "input": {
        "first_frame_url": "https://example.com/first-frame.png",
        "last_frame_url": "https://example.com/last-frame.png",
        "negative_prompt": "blurry, low quality, distorted"
      },
      "parameters": {
        "watermark": false,
        "prompt_extend": false,
        "seed": 12345
      }
    }
  }'

Request Parameters:

Parameter	Type	Required	Description
model	string	Yes	Model name, e.g., wan2.5-i2v-preview or wan2.1-kf2v-plus
prompt	string	Yes	Text prompt describing the video content to be generated
input_reference	string	Yes (first frame mode)	URL or base64-encoded data (data URI format) for the first frame
metadata	object	No	Metadata object containing `input` and `parameters` sub-objects for specifying optional fields from the official Wan request format

metadata.input Parameters:

Parameter	Type	Required	Description
img_url	string	No	URL for the first frame image. Note: In first-frame mode, this can also be provided via the top-level `input_reference` field. For first-and-last-frame mode (wan2.1-kf2v-plus), use `first_frame_url` and `last_frame_url` instead
first_frame_url	string	Yes (first and last frame mode)	URL for the first frame image. Supported model: wan2.1-kf2v-plus (first-and-last-frame mode only, accepts URLs only, not base64-encoded data)
last_frame_url	string	Yes (first and last frame mode)	URL for the last frame image. Supported model: wan2.1-kf2v-plus (first-and-last-frame mode only, accepts URLs only, not base64-encoded data)
negative_prompt	string	No	Negative prompt text to exclude certain elements from the video
audio_url	string	No	URL of custom audio file for audio-visual synchronization. When provided, the `parameters.audio` parameter is ignored. Supported models: wan2.5-i2v-preview, wan2.6-i2v. First-and-last-frame mode (wan2.1-kf2v-plus) does not support this parameter

metadata.parameters Parameters:

Parameter	Type	Required	Description
resolution	string	No	Video resolution. Options: `"480P"` (wan2.5 only), `"720P"`, `"1080P"`. Note: The aspect ratio of the output video is determined by the input first frame image, with minor adjustments to meet technical requirements (width and height must be divisible by 16). First-and-last-frame mode (wan2.1-kf2v-plus) is fixed at 720P
prompt_extend	boolean	No	Enable intelligent prompt rewriting
duration	integer	No	Video duration in seconds. Options: `5`, `10`, `15` (wan2.6 only). First-and-last-frame mode (wan2.1-kf2v-plus) is fixed at 5 seconds
audio	boolean	No	Enable automatic dubbing/background audio generation. When `input.audio_url` is not provided, setting to `true` will automatically generate matching background audio or music. Supported models: wan2.5-i2v-preview, wan2.6-i2v. Note: wan2.2 and earlier versions output only silent videos. First-and-last-frame mode (wan2.1-kf2v-plus) does not support this parameter
watermark	boolean	No	Add watermark to the video
seed	integer	No	Random seed for generation reproducibility. Same seed can produce similar results
shot_type	string	No	Specifies the shot type of the generated video, i.e., whether the video consists of a single continuous shot or multiple switched shots. Options: `"single"` (default, outputs a single-shot video) or `"multi"` (outputs a multi-shot video). Supported model: wan2.6-i2v. Note: This parameter takes effect only when `prompt_extend` is set to `true`. Parameter priority: `shot_type > prompt`. First-and-last-frame mode (wan2.1-kf2v-plus) does not support this parameter

Audio Parameter Notes:

Audio behavior is controlled by input.audio_url and parameters.audio parameters. Priority: audio_url > audio. Three modes are supported:

Generate silent video: Do not pass audio_url, and set audio to false
Automatically generate audio: Do not pass audio_url, and set audio to true (the model automatically generates matching background audio or music based on the prompt and video content)
Use custom audio: Pass audio_url (the audio parameter is ignored, and the video content attempts to align with the audio content, such as lip movements and rhythm)

1. Submit Video Generation Task

Endpoint:

POST /v1/video/generations

Request Headers:

Parameter	Type	Required	Description
Content-Type	string	Yes	application/json
Authorization	string	Yes	Bearer API_KEY

Response Example:

{
  "id": "...",
  "object": "video",
  "model": "wan2.5-i2v-preview",
  "status": "queued",
  "progress": 0,
  "created_at": 1765328779
}

Response Field Descriptions:

Field	Type	Description
id	string	Task ID for subsequent task status queries
object	string	Object type, fixed as "video"
model	string	Model used to generate the video
status	string	Task status, initially "queued"
progress	integer	Task progress, 0-100
created_at	integer	Task creation timestamp

2. Query Task Status

Complete Request

curl -X GET "https://computevault.unodetech.xyz/v1/video/generations/TASK_ID" \
  -H "Authorization: Bearer API_KEY"

Endpoint:

GET /v1/video/generations/{task_id}

Request Headers:

Parameter	Type	Required	Description
Authorization	string	Yes	Bearer API_KEY

Path Parameters:

Parameter	Type	Required	Description
task_id	string	Yes	Task ID

Response Example (Processing):

{
  "code": "success",
  "message": "",
  "data": {
    "task_id": "...",
    "action": "textGenerate",
    "status": "IN_PROGRESS",
    "fail_reason": "",
    "submit_time": 1765328779,
    "start_time": 1765328794,
    "finish_time": 0,
    "progress": "30%",
    "data": {
      "output": {
        "scheduled_time": "2025-12-10 09:06:19.749",
        "submit_time": "2025-12-10 09:06:19.731",
        "task_id": "...",
        "task_status": "RUNNING"
      },
      "request_id": "..."
    }
  }
}

Response Example (Success):

{
  "code": "success",
  "message": "",
  "data": {
    "task_id": "...",
    "action": "textGenerate",
    "status": "SUCCESS",
    "fail_reason": "<OUTPUT_URL>",
    "submit_time": 1765328779,
    "start_time": 1765328794,
    "finish_time": 1765328947,
    "progress": "100%",
    "data": {
      "output": {
        "actual_prompt": "<EDITED_PROMPT>",
        "end_time": "2025-12-10 09:08:53.863",
        "orig_prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
        "scheduled_time": "2025-12-10 09:06:19.749",
        "submit_time": "2025-12-10 09:06:19.731",
        "task_id": "...",
        "task_status": "SUCCEEDED",
        "video_url": "<OUTPUT_URL>"
      },
      "request_id": "...",
      "usage": {
        "video_count": 1,
        "video_duration": 5,
        "video_ratio": "1920*1080"
      }
    }
  }
}

You can retrieve the video URL from the data.data.output.video_url field.

Response Example (Failed):

{
  "code": "success",
  "message": "",
  "data": {
    "task_id": "...",
    "action": "textGenerate",
    "status": "FAILURE",
    "fail_reason": "task failed, code: InvalidParameter , message: image_url must provided",
    "submit_time": 1765407269,
    "start_time": 1765407278,
    "finish_time": 1765407294,
    "progress": "100%",
    "data": {
      "output": {
        "code": "InvalidParameter",
        "end_time": "2025-12-11 06:54:49.934",
        "message": "image_url must provided",
        "scheduled_time": "2025-12-11 06:54:29.557",
        "submit_time": "2025-12-11 06:54:29.529",
        "task_id": "...",
        "task_status": "FAILED"
      },
      "request_id": "..."
    }
  }
}

Response Field Descriptions:

Field	Type	Description
code	string	Response status code, "success" indicates success
message	string	Response message
data	object	Task data object
data.task_id	string	Task ID
data.status	string	Task status: IN_PROGRESS, SUCCESS, FAILURE
data.progress	string	Task progress percentage
data.data.output.video_url	string	Video access URL (when task succeeds)
data.data.output.task_status	string	Task status: RUNNING, SUCCEEDED, FAILED
data.data.usage	object	Usage statistics (when task succeeds)
data.data.usage.video_count	integer	Number of videos generated
data.data.usage.video_duration	integer	Video duration (seconds)
data.data.usage.video_ratio	string	Video resolution

Important Notes

Base64 Data Format: For first frame mode, base64 data must use data URI format: data:{MIME_TYPE};base64,{base64_data}, not plain base64 strings.
First and Last Frame Mode Limitations: The first and last frame fields for the wan2.1-kf2v-plus model only accept URLs, not base64-encoded data.
Model Selection:
- wan2.5-i2v-preview: Supports first frame mode image-to-video
- wan2.1-kf2v-plus: Supports first and last frame mode image-to-video
Metadata: The request's metadata field can be used to write any field that exists in the official request format. For example, if you need to specify the official format's parameters.resolution in the request, use metadata.parameters.resolution. See official documentation for details about optional request parameters and their allowed values.

Wan Model Image-to-Video API Documentation

On this page