Skip to main content

🎯 Lip Sync — Prompting Guide (OmniHuman 1.5)

This article explains how to write effective prompts for Lip Sync, based on the official OmniHuman-1.5 prompt guidelines.

Updated over 2 months ago

The goal is to help you unlock the full expressive power of the model: emotional acting, realistic movements, multi-style performance, and context-aware reactions.

1. How Prompts Work in Lip Sync

Lip Sync uses three inputs:

  • Image — identity, appearance, environment reference.

  • Audio — timing + speech + emotional cues + semantics.

  • Prompt — everything the audio cannot explicitly define:
    style, mood, acting instructions, camera motion, scenario, personality, intensity.

OmniHuman 1.5 reads prompts as direction for the actor — similar to giving instructions on a movie set.

Prompts can influence:

  • emotional style

  • expression intensity

  • micro-gestures

  • acting logic

  • camera behavior

  • character behavior

  • style (film / social media / dramatic / natural)

  • posture

  • mood

  • environment feeling

They cannot override:

  • identity from the input image

  • timing of the lip motion

  • semantic correctness of speech

2. Core Principles from OmniHuman-1.5

✓ Principle 1: Prompt = Acting Instructions

Prompts are best when they describe:

  • tone

  • attitude

  • emotional state

  • intent

  • subtle behaviors

Example:

“Warm, sincere tone, soft smile, thoughtful eye movement, slight nod.”

✓ Principle 2: Respect the Audio

If the audio is sad/slow but the prompt says “energetic influencer style,” the result becomes unnatural.

Prompts must align with the emotion and rhythm of the audio.

✓ Principle 3: Don’t Over-direct

Avoid complex or contradictory instructions.

❌ “Fast camera spin, walking through a forest, laughing loudly, whispering seductively.”
✔ “Soft natural expression, slight head movement, gentle smile.”

✓ Principle 4: Use prompts to enhance, not replace

The audio defines core behavior, the prompt defines style.

3. Prompt Structure (Recommended Template)

A best-practice prompt should include these blocks:

Emotion + Tone Facial Expression Eye & Head Movement Body Language (light gestures) Persona / Character Style Camera Behavior Scene Mood (optional)

Example full prompt:

“Calm and confident tone. Soft smile with friendly eye contact. Slight head tilt and relaxed micro-gestures. Social-media style delivery with warm lighting. Minimal camera drift forward.”

4. Product Highlights → How to Use Them in Prompts

Based on the official Product Highlights of OmniHuman-1.5:

⭐ Audio Comprehension

The character can perform actions based on speech meaning.
Use prompts to complement that.

Example:

“React with curiosity when asking questions. Smile when mentioning something positive.”

⭐ Precise Response to Instructions

The model follows motion, acting, and camera directions.

Prompt example:

“Slight camera push-in, gentle head movement, expressive eyes.”

⭐ Vivid Emotional Performance

You can request specific emotional levels.

Examples:

  • “Extremely emotional, teary eyes, voice trembling.”

  • “Light emotional tone, soft and warm.”

  • “High-energy influencer vibe, expressive gestures.”

⭐ Multi-person Scenario Support

(Lip Sync currently uses single-person, but the logic still applies.)

You can still reference imaginary second subjects:

“Looking slightly off-camera as if talking to someone in the room.”

⭐ Diverse Styles

Supports realistic, cinematic, dramatic, vlog, music-video, and more.

Examples:

  • “Cinematic, shallow depth of field.”

  • “TikTok influencer style.”

  • “Documentary interview tone.”

  • “Dramatic theater-style expression.”

🎭 5. Scenario-Based Prompting (Based on Official Scenario Cases)

From OmniHuman-1.5 Scenario Cases:
Film / Short Video, Fantasy Vlogs, AI Music Content, UGC Content.

Below — adapted prompting guidance for Lip Sync.

Scenario Type 1: Film, TV & Cinematic Narration

Goal:

High drama, strong emotion, strong performance.

Prompt Examples:

  • “Dramatic close-up, slow emotional delivery, intense eye contact, slight camera drift.”

  • “Film-grade performance, deep breath before speaking, expressive eyebrows, cinematic color tone.”

  • “Serious and cold tone, minimal movement, eyes locked on camera.”

Best Uses:

  • monologues

  • storytelling

  • emotional speeches

Scenario Type 2: Fantasy Vlog / Character Roleplay

Goal:

Become an in-world persona.

Prompt Examples:

  • “Playful fantasy character, curious tone, light magical wonder in expression.”

  • “Storyteller mood, dramatic pauses, expressive gestures.”

  • “Warm, friendly guide talking to the viewer in a fantasy environment.”

Best Uses:

  • lore videos

  • character introductions

  • story expansions

Scenario Type 3: AI MV / Music Content

Goal:

Music-video style charisma.

Prompt Examples:

  • “Strong emotional delivery, rhythmic head movements matching the beat.”

  • “Bold and expressive performance, open smile, confident vibe.”

  • “MV-style dramatic lighting, slow-motion feel, intense expressions.”

Best Uses:

  • lip-sync music videos

  • expressive emotional content

  • reaction videos

Scenario Type 4: UGC / Social Media Content

Goal:

Natural, casual, authentic communication.

Prompt Examples:

  • “TikTok influencer tone, friendly smile, natural micro-gestures.”

  • “Talking casually as if recording from home, relaxed posture.”

  • “Fast, energetic, expressive eyes, engaging tone.”

Best Uses:

  • product promos

  • reactions

  • influencer-style content

6. Advanced Prompt Techniques

1. Emotion Intensity Levels

Try specifying strength:

  • “Very subtle expression.”

  • “Medium emotion, natural.”

  • “High-intensity acting, dramatic facial movement.”

2. Camera Movement

OmniHuman 1.5 supports camera motion.

Examples:

  • “Slow push-in.”

  • “Gentle left-right drift.”

  • “Light handheld feel.”

  • “No camera movement.”

3. Micro-Gestures

Extremely effective:

  • “Soft blinking.”

  • “Small head nods.”

  • “Slight eyebrow movement.”

  • “Gentle smile transitions.”

4. Persona & Character Type

Defines the vibe:

  • “Teacher explaining calmly.”

  • “Influencer speaking confidently.”

  • “Villain with slow, cold delivery.”

  • “Friendly AI assistant.”

5. Scene Mood

Even if the background doesn’t change, mood affects acting:

  • “Cozy and warm, relaxed vibe.”

  • “Dark and serious tone.”

  • “Professional business-style delivery.”

7. What to Avoid

  • ❌ Overly long instructions

  • ❌ Conflicting emotions

  • ❌ Camera instructions that contradict image perspective

  • ❌ Describing impossible physical actions (walking, jumping, etc.)

  • ❌ Demanding expressions opposite of audio tone

  • ❌ Asking to change identity

8. Ready-to-Use Prompt Examples

🔹 Natural Social Media Talk

“Friendly influencer tone, soft smile, relaxed micro-gestures, natural head movement, warm energy.”

🔹 Cinematic Monologue

“Serious emotional monologue, slow delivery, deep eye contact, dramatic lighting tone.”

🔹 Enthusiastic Vlogger

“Energetic, expressive gestures, big smile, lively eye movement, confident tone.”

🔹 Calm & Soft

“Warm, gentle tone, minimal movement, soft smile, subtle expressions.”

🔹 Dramatic Character Roleplay

“Intense roleplay tone, expressive eyes, theatrical pacing, controlled but dramatic gestures.”

🏁 Conclusion

Prompting in Lip Sync is about directing a digital actor.
The image defines who they are.
The audio defines what they say.
The prompt defines how they say it.

With the right combination, you can create:

  • cinematic storytelling

  • influencer videos

  • character vlogs

  • emotional performances

  • music-style lip syncs

  • fantasy roleplay

  • natural casual talking content

Use this guide as your foundation to build powerful, expressive videos with OmniHuman-1.5.

Did this answer your question?