How to make Kira Vasquez look like the same person across 14 shots, 7+ AI models, and 90 seconds of film.
Every AI model generates faces independently. Veo 3.1 will render Kira differently than Sora 2, which will render her differently than Kling 3.0. Without a deliberate consistency strategy, she'll look like 14 different women across 14 shots — and the film falls apart. Character consistency is THE hardest problem in AI filmmaking, and the entire industry acknowledges it. Our strategy uses Higgsfield's native tools (Soul ID, Popcorn, Kling O1 Element Library) combined with reference-image workflows to lock Kira's identity before a single frame is animated.
Before touching any AI tool, we need to lock every visual detail of Kira Vasquez on paper. The more specific we are, the more consistent every model will render her.
These details must remain identical in EVERY shot — they're what tells the audience "this is the same person":
This is the exact workflow to establish and maintain Kira's identity across every model and every shot. Think of it like casting an actor, then making sure wardrobe and makeup match on every shoot day.
Before any animation, create 10–20 static reference images of Kira from multiple angles and lighting conditions. This becomes the "casting portfolio" that every other tool references.
What to generate:
— Front face, neutral expression, even lighting (the "passport photo")
— 3/4 view left and right
— Profile view
— With helmet on, visor up
— With helmet on, visor down (reflections visible)
— Full body in fire suit, standing
— Cockpit seated position, hands on wheel
— Close-up of hands in gloves on steering wheel
— The car exterior from 3 angles
Which model to use for each:
— Nano Banana Pro for face + helmet shots (4K, reasoning-guided, character consistency built in, can render the visor HUD text)
— Soul for fashion-grade portraits (cinematic lighting, editorial quality, the "hero beauty" shots)
— Seedream 4.5 for the car and full-body shots (consistent lighting, identity preservation, strong spatial understanding)
Take your best 10–15 reference images from Step 1 and train a Soul ID avatar. This creates a digital identity that Higgsfield's system "remembers" — it preserves facial structure, bone structure, skin tone, and expression style across all future generations.
Training requirements:
— Upload 10–15 of your best Kira images
— Include multiple angles (front, 3/4, profile)
— Include multiple lighting conditions
— Include both with-helmet and without-helmet shots
— Avoid pure white backgrounds (hurts depth/blending)
— Training takes ~5 minutes, costs ~$3
After training, validate: Generate a 4–8 image grid changing ONLY the camera angle. Confirm face, bone structure, eye color, and scar are stable across all shots. If anything drifts, retrain with better reference images.
Use Popcorn to generate all 14 key frames as a connected storyboard sequence. Popcorn's "intelligent scene continuity" system is specifically designed to maintain character identity, lighting coherence, and style across multi-frame sequences.
How to use it:
— Feed your Soul ID avatar as the character reference
— Use Auto Mode: write a detailed prompt describing the scene, specify up to 8 frames
— Popcorn's Character Anchoring memorizes facial structure, clothing texture, posture
— Lighting Continuity system locks the mood once established (night rain, neon)
— Style Coherence prevents the "patchwork" look between frames
Generate storyboard frames for all 14 shots. These become the input key frames for video animation in Step 4.
Now take each Popcorn storyboard frame and animate it using image-to-video mode in the appropriate model. The key: you're not generating from text alone — you're feeding the locked key frame as the starting image.
Critical rules for consistency during animation:
— Always use image-to-video, never pure text-to-video (text-to-video ignores your character lock)
— Feed the Popcorn storyboard frame as the input image
— For models that support it, also upload the Soul ID reference images as additional character refs
— Use start/end frame control (Kling 3.0, Kling O1) to define exact transitions
— Keep prompts focused on motion and camera, not character description (the image already defines the character)
— Minimize re-describing Kira's face in text — let the reference image do the work
Even with perfect references, some shots will drift. That's normal. Here's the safety net:
Kling O1 Semantic Editing: If a generated clip has the right motion but Kira's face shifted, use O1's edit mode to fix it. Upload the clip + Soul ID reference → "Replace the character's face with the reference, keep everything else unchanged." O1 does this without masking or rotoscoping.
Character Swap 2.0 / Face Swap: Higgsfield's dedicated face replacement tools. Feed a generated video clip + Soul ID reference → the tool replaces the face while preserving lighting, emotion, and motion. Use this as the last line of defense for any shot where the face doesn't match.
Popcorn Re-generation: If a key frame is off, don't start from scratch — regenerate just that frame in Popcorn with the same Soul ID and scene continuity. Popcorn's anchoring system will pull it back into alignment.
The rule: If a shot's motion and atmosphere are perfect but the face is 80% right, fix the face in post rather than re-generating the entire shot. This saves massive time.
How each model handles character identity — and what reference method to use.
| Model | Native Consistency | Reference Method | Max Refs | Fix Strategy |
|---|---|---|---|---|
| Soul ID | ★★★★★ Trained avatar | 10–15 training photos | 15+ | Source of truth — retrain if drift |
| Popcorn | ★★★★★ Multi-frame anchoring | Soul ID + prompt | 8 frames | Regenerate single frame |
| Kling O1 | ★★★★☆ Element Library | Upload refs to Element Library | 7 images | Semantic edit to fix face |
| Kling 3.0 | ★★★★☆ Subject reference | Start/end frame + image refs | 7 images | Re-gen with locked start frame |
| Nano Banana Pro | ★★★★☆ Reasoning-guided | Multi-image fusion | 8 images | Re-gen with more refs |
| Seedance 2.0 | ★★★☆☆ Multimodal refs | Image + video refs | Multiple | Feed more refs, re-gen |
| Sora 2 | ★★★☆☆ Temporal consistency | Single input image | 1 | Face Swap in post |
| Veo 3.1 | ★★★☆☆ Good preservation | Single input image | 1 | Face Swap in post |
| Hailuo 02 | ★★☆☆☆ Physics > faces | Single input image | 1 | Face Swap critical |
| Wan 2.5 | ★★★☆☆ Good with ref | Image-to-video + prompt | 1 | Face Swap if needed |
| Seedream 4.5 | ★★★★☆ Identity preservation | Multi-image editing | Multiple | Re-gen with refs |
Always feed an input image. Text-to-video generates a new face every time. Image-to-video preserves the face you've locked.
Generate the reference sheet (Step 1) before touching any video model. This is casting day. Don't skip it.
Train once. Then generate 8 test images at different angles. If any are off, retrain. Soul ID is your foundation.
Write one frozen character description and paste it into every single prompt. Never paraphrase — exact words = consistent output.
Don't generate storyboard frames in 5 different models. Use Popcorn's scene continuity system to create ALL key frames as a connected sequence.
If a shot drifts, use O1's Element Library + semantic editing to fix the face without re-generating the motion. Save your best takes.
Before final edit: lay all 14 shots side-by-side. Flag any where Kira looks different. Fix with Face Swap / O1. This step is non-negotiable.
Shots 01, 11 (ECU Face) — Highest risk. These are the close-ups where any face drift is immediately visible. Use Soul ID + Sora 2 (best temporal face consistency) or Kling 3.0 (lip-sync). Always validate against reference sheet.
Shots 02, 05, 12, 13 (Wide) — Lower risk. Kira is small in frame or in a helmet. The car and environment matter more than facial detail. Focus on matching the helmet shape, arm stripe, and car design.
Shots 03, 07, 09, 10 (Cockpit POV) — Medium risk. We see gloves, hands, steering wheel, HUD — but not much face. The visual anchors here are the gloves, the MANUAL toggle, and the cyan arm stripe. Lock these props in Popcorn frames.
Shots 06 (Overtake Montage) — Quick cuts mean small inconsistencies are hidden. Use Seedance's multi-reference system to keep the car consistent. Face barely visible at speed.
Shot 08 (Near-Spin) — The car is the character here. Lock the car design using Hailuo 02 or Sora 2's physics engine. Face not visible.