Ai Generator by prompt (Text-to-Video)

Write a prompt, choose a model (wan 2.2, wan 2.7), choose resolution up to 1080 and download your clips.

Written By Alex Mavr

Last updated About 8 hours ago

What the tool does

The prompt-based generator creates videos directly from text. You control the objects, appearance, settings, lighting, camera angles, composition, and style. Select a model for generation, enter a prompt, and choose the aspect ratio and video duration. Results are commercially licensed and can be sent downstream to any ZenCreator tool.

Interface tour

Model - select the model that best suits your needs:

Wan 2.2 Spicy: excellent quality, resolution up to 720p, 8 sec max duration
Wan 2.7 Spicy: good quality, resolution up to 1080p, 15 sec max duration

Prompt: Describe what you would like to see in the video. Pay attention to objects, details, clothing, poses, lighting, camera angles, mood, and the setting.

Aspect Ratio: Select the desired aspect ratio for the images you are creating (for example, 1:1, 3:4, 9:16). This determines the width-to-height ratio of all output data.

Resolution: Select the desired resolution from 720p (WAN 2.2, WAN 2.7) to 1080p (WAN 2.7 only)

Duration: Select the desired video duration (5 to 8 seconds in wan 2.2, 5 to 15 seconds in wan 2.7)

Models overview

Wan 2.2 spicy - 720p/Safety Filters: Minimal

Duration: 5 s or 8 s
Best for: all the benefits of the base model, plus additional training on a massive labeled dataset of NSFW content
Strengths: punchy detail, strong adherence to your reference image, good at dramatic camera moves and moody grading.
Trade-offs: less conservative filtering means you should keep prompts precise; anatomy and small details can drift if you push extremes.
Content notes: Safety Filters: Minimal

Wan 2.7 spicy - 720p/Safety Filters: Minimal

Duration: 5 s , 10 s , 15 s
Best for:all the benefits of the base model, plus additional training on a massive labeled dataset of NSFW content.
Strengths: improved temporal consistency, better prompt understanding, enhanced motion quality, supports 720p and 1080p.
Trade-offs: slower than Ultra Fast models; best results require clear prompts and high-quality input images.
Content notes: Filters: Minimal

Quick Pick

Need high-detail NSFW with dramatic camera moves and moody cinematic grading → Wan 2.2 (720p)
Need the best motion quality and temporal consistency in NSFW → Wan 2.7 (1080p, 5s / 10s / 15s)

Quick Start

Choose a Model based on your content type, quality needs
Write a clear Prompt describing the subject, clothing, lighting, mood, and environment
Set Aspect Ratio, resolution
Set Duration based on your model choose
Click Start Generation, then review the results and Download them

Actioning results

Use the checkboxes on thumbnails to Select/Deselect.
Download Selected.
Send to another tool to continue the pipeline (Video Upscaler, Video-to-Video).
Retry Failed appears automatically if any frames mis-rendered.

Prompting tips for video

Focus on motion and camera: “gentle parallax, slow dolly-in, subtle hair flutter, cloth ripple, soft depth-of-field.”
Keep it one idea per clip. If you want multiple motions, render separate versions (it’s faster and cleaner).
For identity consistency, describe key facial/hair traits in the prompt and choose the same model/duration across the batch.

Best practices & pro notes

Start with 5-second clips—it’s faster and cheaper to finalize the prompt, style, lighting, and movement that way. Only after the visuals have been approved should you move on to 8-second (Wan 2.2) or 10–15-second (Wan 2.7) clips.
Use prompts that are as precise and structured as possible. Both Wans understand descriptions very well, but with minimal filters, they don’t handle vague wording well—especially when it comes to complex poses and anatomy.
In a standard text-to-video prompt (without a reference image), provide a detailed description of the character’s appearance at the beginning of the prompt: age, body type, facial features, hairstyle, clothing, lighting, and mood. The more detailed the description, the more consistent and high-quality the result will be.

Known limitations (and how to mitigate)

Anatomy and fine details can drift — use simple poses and clear prompts.
Motion may flicker, especially on Wan 2.2 — reduce camera movement or keep clips short.
Results can vary in text-to-video — generate several versions and pick the best.

FAQ

1. Which model is better—the Wan 2.2 or the Wan 2.7?

Wan 2.7 offers better motion quality, resolution (1080p), and frame rate stability. Wan 2.2 excels in sharpness and detail and maintains a more consistent visual style, but it falls short in terms of smoothness. Choose 2.7 for most situations.

2. Can these models be used for NSFW content?

Yes. Some models have Minimal Safety Filters and are well-suited for NSFW content. Wan 2.7 generally produces higher-quality results in complex scenes.

3. Why is the quality of text-to-video lower than that of a reference image?

Without a starting image, the model relies solely on the text description. That’s why it’s important to describe the character’s appearance, lighting, and style in great detail at the beginning of the prompt.

4. How can I make a video longer than 8–15 seconds?

Longer videos are created using a chained workflow.
This is done with Start Frame and End Frame:
Generate the first clip.
Use the last frame of that clip as the Start Frame for the next generation.
Optionally set an End Frame to guide the final state.
Repeat the process to extend the sequence.
This approach allows you to:
maintain visual and motion continuity
build videos of virtually any length
avoid abrupt motion changes
This workflow is supported by Seedance Pro, WAN, and Kling Start/End Frame models.

5. How can we address anatomical issues and artifacts?

Use simple poses, avoid excessive camera movement, and take several shots. Minor flaws are easier to fix in post-production.