The goal is to help you unlock the full expressive power of the model: emotional acting, realistic movements, multi-style performance, and context-aware reactions.
1. How Prompts Work in Lip Sync
Lip Sync uses three inputs:
Image — identity, appearance, environment reference.
Audio — timing + speech + emotional cues + semantics.
Prompt — everything the audio cannot explicitly define:
style, mood, acting instructions, camera motion, scenario, personality, intensity.
OmniHuman 1.5 reads prompts as direction for the actor — similar to giving instructions on a movie set.
Prompts can influence:
emotional style
expression intensity
micro-gestures
acting logic
camera behavior
character behavior
style (film / social media / dramatic / natural)
posture
mood
environment feeling
They cannot override:
identity from the input image
timing of the lip motion
semantic correctness of speech
2. Core Principles from OmniHuman-1.5
✓ Principle 1: Prompt = Acting Instructions
Prompts are best when they describe:
tone
attitude
emotional state
intent
subtle behaviors
Example:
“Warm, sincere tone, soft smile, thoughtful eye movement, slight nod.”
✓ Principle 2: Respect the Audio
If the audio is sad/slow but the prompt says “energetic influencer style,” the result becomes unnatural.
Prompts must align with the emotion and rhythm of the audio.
✓ Principle 3: Don’t Over-direct
Avoid complex or contradictory instructions.
❌ “Fast camera spin, walking through a forest, laughing loudly, whispering seductively.”
✔ “Soft natural expression, slight head movement, gentle smile.”
✓ Principle 4: Use prompts to enhance, not replace
The audio defines core behavior, the prompt defines style.
3. Prompt Structure (Recommended Template)
A best-practice prompt should include these blocks:
Emotion + Tone Facial Expression Eye & Head Movement Body Language (light gestures) Persona / Character Style Camera Behavior Scene Mood (optional)
Example full prompt:
“Calm and confident tone. Soft smile with friendly eye contact. Slight head tilt and relaxed micro-gestures. Social-media style delivery with warm lighting. Minimal camera drift forward.”
4. Product Highlights → How to Use Them in Prompts
Based on the official Product Highlights of OmniHuman-1.5:
⭐ Audio Comprehension
The character can perform actions based on speech meaning.
Use prompts to complement that.
Example:
“React with curiosity when asking questions. Smile when mentioning something positive.”
⭐ Precise Response to Instructions
The model follows motion, acting, and camera directions.
Prompt example:
“Slight camera push-in, gentle head movement, expressive eyes.”
⭐ Vivid Emotional Performance
You can request specific emotional levels.
Examples:
“Extremely emotional, teary eyes, voice trembling.”
“Light emotional tone, soft and warm.”
“High-energy influencer vibe, expressive gestures.”
⭐ Multi-person Scenario Support
(Lip Sync currently uses single-person, but the logic still applies.)
You can still reference imaginary second subjects:
“Looking slightly off-camera as if talking to someone in the room.”
⭐ Diverse Styles
Supports realistic, cinematic, dramatic, vlog, music-video, and more.
Examples:
“Cinematic, shallow depth of field.”
“TikTok influencer style.”
“Documentary interview tone.”
“Dramatic theater-style expression.”
🎭 5. Scenario-Based Prompting (Based on Official Scenario Cases)
From OmniHuman-1.5 Scenario Cases:
Film / Short Video, Fantasy Vlogs, AI Music Content, UGC Content.
Below — adapted prompting guidance for Lip Sync.
Scenario Type 1: Film, TV & Cinematic Narration
Goal:
High drama, strong emotion, strong performance.
Prompt Examples:
“Dramatic close-up, slow emotional delivery, intense eye contact, slight camera drift.”
“Film-grade performance, deep breath before speaking, expressive eyebrows, cinematic color tone.”
“Serious and cold tone, minimal movement, eyes locked on camera.”
Best Uses:
monologues
storytelling
emotional speeches
Scenario Type 2: Fantasy Vlog / Character Roleplay
Goal:
Become an in-world persona.
Prompt Examples:
“Playful fantasy character, curious tone, light magical wonder in expression.”
“Storyteller mood, dramatic pauses, expressive gestures.”
“Warm, friendly guide talking to the viewer in a fantasy environment.”
Best Uses:
lore videos
character introductions
story expansions
Scenario Type 3: AI MV / Music Content
Goal:
Music-video style charisma.
Prompt Examples:
“Strong emotional delivery, rhythmic head movements matching the beat.”
“Bold and expressive performance, open smile, confident vibe.”
“MV-style dramatic lighting, slow-motion feel, intense expressions.”
Best Uses:
lip-sync music videos
expressive emotional content
reaction videos
Scenario Type 4: UGC / Social Media Content
Goal:
Natural, casual, authentic communication.
Prompt Examples:
“TikTok influencer tone, friendly smile, natural micro-gestures.”
“Talking casually as if recording from home, relaxed posture.”
“Fast, energetic, expressive eyes, engaging tone.”
Best Uses:
product promos
reactions
influencer-style content
6. Advanced Prompt Techniques
1. Emotion Intensity Levels
Try specifying strength:
“Very subtle expression.”
“Medium emotion, natural.”
“High-intensity acting, dramatic facial movement.”
2. Camera Movement
OmniHuman 1.5 supports camera motion.
Examples:
“Slow push-in.”
“Gentle left-right drift.”
“Light handheld feel.”
“No camera movement.”
3. Micro-Gestures
Extremely effective:
“Soft blinking.”
“Small head nods.”
“Slight eyebrow movement.”
“Gentle smile transitions.”
4. Persona & Character Type
Defines the vibe:
“Teacher explaining calmly.”
“Influencer speaking confidently.”
“Villain with slow, cold delivery.”
“Friendly AI assistant.”
5. Scene Mood
Even if the background doesn’t change, mood affects acting:
“Cozy and warm, relaxed vibe.”
“Dark and serious tone.”
“Professional business-style delivery.”
7. What to Avoid
❌ Overly long instructions
❌ Conflicting emotions
❌ Camera instructions that contradict image perspective
❌ Describing impossible physical actions (walking, jumping, etc.)
❌ Demanding expressions opposite of audio tone
❌ Asking to change identity
8. Ready-to-Use Prompt Examples
🔹 Natural Social Media Talk
“Friendly influencer tone, soft smile, relaxed micro-gestures, natural head movement, warm energy.”
🔹 Cinematic Monologue
“Serious emotional monologue, slow delivery, deep eye contact, dramatic lighting tone.”
🔹 Enthusiastic Vlogger
“Energetic, expressive gestures, big smile, lively eye movement, confident tone.”
🔹 Calm & Soft
“Warm, gentle tone, minimal movement, soft smile, subtle expressions.”
🔹 Dramatic Character Roleplay
“Intense roleplay tone, expressive eyes, theatrical pacing, controlled but dramatic gestures.”
🏁 Conclusion
Prompting in Lip Sync is about directing a digital actor.
The image defines who they are.
The audio defines what they say.
The prompt defines how they say it.
With the right combination, you can create:
cinematic storytelling
influencer videos
character vlogs
emotional performances
music-style lip syncs
fantasy roleplay
natural casual talking content
Use this guide as your foundation to build powerful, expressive videos with OmniHuman-1.5.
