Veo Model Image-to-Video API Documentation

Veo is a high-quality image-to-video generation model developed by Google. This document describes the complete API interface specification for using Google Veo model for image-to-video generation. All video generation calls use the same /v1/video/generations endpoint, with different parameters depending on the use case. Image data is provided as a base64-encoded string.

Supported Models

Currently supported models include:

Model	Description
veo-3.0-generate-001	Veo 3.0 image-to-video generation model
veo-3.0-fast-generate-001	Veo 3.0 fast image-to-video generation model
veo-3.1-generate-001	Veo 3.1 image-to-video generation model
veo-3.1-fast-generate-001	Veo 3.1 fast image-to-video generation model

Overview

The Veo model image-to-video feature provides an asynchronous task processing mechanism:

Submit Task: Send an image and text prompt to create a video generation task
Query Status: Query generation progress and status through task ID
Get Results: Retrieve the generated video file after task completion

Task Status Flow

queued → in_progress → completed
                ↓
            failed

queued: Task has been submitted and is waiting to be processed
in_progress: Task is being processed
completed: Task completed successfully, video has been generated
failed: Task failed

API List

Method	Path	Description
POST	/v1/video/generations	Submit video generation task (standard format)
GET	/v1/video/generations/{task_id}	Query task status (standard format)
POST	/v1/videos	Submit video generation task
GET	/v1/videos/{task_id}	Query task status
GET	/v1/videos/{task_id}/content	Get video content (streaming download)

Usage Examples

1. Basic Image-to-Video

The simplest form of image-to-video generation uses a single image as the first frame.

Request Body:

{
  "model": "veo-3.1-generate-001",
  "prompt": "A cat playing piano in a beautiful garden",
  "image": "<BASE64_ENCODED_IMAGE_DATA>",
  "metadata": {}
}

2. First and Last Frames

The image in the image field specifies the first frame of the video. The image in metadata.lastFrame specifies the last frame. This allows you to control both the starting and ending frames of the generated video.

Note: This feature is only supported by Veo 3.1 models.

Request Body:

{
  "model": "veo-3.1-generate-001",
  "prompt": "A cat playing piano in a beautiful garden",
  "image": "<BASE64_ENCODED_IMAGE_DATA>",
  "metadata": {
    "lastFrame": "<BASE64_ENCODED_IMAGE_DATA>"
  }
}

3. Reference Images

Images are specified in an array in metadata.referenceImages, containing up to 3 elements. Each reference image is an object containing image: base64-encoded image data and referenceType: a string with value "asset" or "style".

Note: This feature is only supported by veo-3.1-generate-001.

Request Body:

{
  "model": "veo-3.1-generate-001",
  "prompt": "A cat playing piano in a beautiful garden",
  "image": "<BASE64_ENCODED_IMAGE_DATA>",
  "metadata": {
    "referenceImages": [
      {
        "image": "<BASE64_ENCODED_IMAGE_DATA>",
        "referenceType": "asset"
      },
      {
        "image": "<BASE64_ENCODED_IMAGE_DATA>",
        "referenceType": "style"
      }
    ]
  }
}

Request Parameters:

Parameter	Type	Required	Description
model	string	Yes	Model name, e.g., veo-3.1-generate-001
prompt	string	Yes	Text prompt describing the video content to be generated
image	string	Yes	Base64-encoded image data for the first frame
metadata	object	No	Extended parameters object

metadata Parameters:

Parameter	Type	Required	Description
aspectRatio	string	No	Video aspect ratio, options: "16:9", "9:16"
durationSeconds	number	No	Video duration (seconds), options: 4, 6, 8
negativePrompt	string	No	Negative prompt describing content not desired in the video
personGeneration	string	No	Person generation strategy, options: "allow_all" (text-to-video), "allow_adult" (image-to-video)
resolution	string	No	Video resolution, e.g., "1080p", "720p"
sampleCount	number	No	Number of videos to generate, default 1
storageUri	string	No	Google Cloud Storage URI for storing generated videos
lastFrame	string	No	Base64-encoded image data for the last frame (Veo 3.1 models only)
referenceImages	array	No	Array of reference images, up to 3 elements (veo-3.1-generate-001 only)

referenceImages Array Elements:

Parameter	Type	Required	Description
image	string	Yes	Base64-encoded image data
referenceType	string	Yes	Reference type, options: "asset" or "style"

1. Submit Video Generation Task

Complete Request:

curl -X POST "https://computevault.unodetech.xyz/v1/video/generations" -H "Content-Type: application/json" -H "Authorization: Bearer API_KEY" -d @veoImageToVideoTest.json

Endpoint:

POST /v1/video/generations

Request Headers:

Parameter	Type	Required	Description
Content-Type	string	Yes	application/json
Authorization	string	Yes	Bearer API_KEY

Response Example:

{
  "task_id": "TASK_ID"
}

Response Field Descriptions:

Field	Type	Description
task_id	string	Task ID for subsequent task status queries

2. Query Task Status

Complete Standard Format Endpoint

curl -X GET "https://computevault.unodetech.xyz/v1/video/generations/TASK_ID" -H "Authorization: Bearer API_KEY"

Endpoint:

GET /v1/video/generations/{task_id}

Request Headers:

Parameter	Type	Required	Description
Authorization	string	Yes	Bearer API_KEY

Path Parameters:

Parameter	Type	Required	Description
task_id	string	Yes	Task ID

Response Example (Processing):

{
  "code": "success",
  "message": "",
  "data": {
    "bytes_base64_encoded": "",
    "error": null,
    "format": "mp4",
    "metadata": null,
    "status": "processing",
    "task_id": "TASK_ID",
    "url": ""
  }
}

Response Example (Success):

{
  "code": "success",
  "message": "",
  "data": {
    "bytes_base64_encoded": "",
    "error": null,
    "format": "mp4",
    "metadata": null,
    "status": "succeeded",
    "task_id": "TASK_ID",
    "url": "https://computevault.unodetech.xyz/v1/videos/TASK_ID/content"
  }
}

Note: Depending on the AI service provider, the video will be returned either as base64-encoded data in the bytes_base64_encoded field (Vertex) or via a content URL in the url field (Gemini).

Response Example (Failed):

{
  "code": "success",
  "message": "",
  "data": {
    "bytes_base64_encoded": "",
    "error": null,
    "format": "mp4",
    "metadata": null,
    "status": "failed",
    "task_id": "TASK_ID",
    "url": "Reference to video does not support this mix of reference images."
  }
}

When a task fails, the url field contains the error message instead of a video URL.

Response Field Descriptions:

Field	Type	Description
code	string	Response status code, "success" indicates success
data	object	Task data object
data.task_id	string	Task ID
data.status	string	Task status: queued, in_progress, succeeded, failed
data.format	string	Video format, e.g., "mp4"
data.url	string	Video access URL (when task succeeds), or error message (when task fails)
data.bytes_base64_encoded	string	Base64-encoded video data (when available)
data.error	object	Error information (when task fails)
message	string	Error message

Important Notice

NOTE: Due to Google's Responsible AI Guidelines, some tasks going through the Gemini channel may return a successful response but have their video outputs blocked. In this case, the filtering details will be visible in the metadata.rai_media_filtered_count and metadata.rai_media_filtered_reasons fields like in the below example:

{
  "code": "success",
  "message": "",
  "data": {
    "bytes_base64_encoded": "",
    "error": null,
    "format": "mp4",
    "metadata": {
      "rai_media_filtered_count": 1,
      "rai_media_filtered_reasons": ["Sorry, we can't create videos with real people's names or likenesses. Please remove the celebrity reference and try again."]
    },
    "status": "succeeded",
    "task_id": "bW9kZWxzL3Zlby0zLjAtZmFzdC1nZW5lcmF0ZS0wMDEvb3BlcmF0aW9ucy9hd2IxZDhsNDVydGM",
    "url": "Sorry, we can't create videos with real people's names or likenesses. Please remove the celebrity reference and try again."
  }
}

Veo Model Image-to-Video API Documentation

On this page