Digital Avatar Flow

The Property Joes Group -- Powered by The Curator
Canopy (Overview)
Understory (Workflow)
Root Level (Build)
3,801
Face-on-Camera Source Videos
$34-44
Est. Monthly Cost
5
Pipeline Steps
30 min
Avatar Video / Month

What This Does

Creates a digital talking-head avatar of Joseph -- his actual face, speaking in his cloned voice, from any written script. No filming needed. Used for newsletter videos, listing walkthroughs, transaction follow-ups, and social media content at scale.

Companion to the Voice Digitization Flow -- the voice pipeline produces the audio, this flow adds the face.

Script
Cloned Voice
Avatar Video
Review
Publish

The "Digital Joseph" Stack

F5-TTS Voice Clone
+
HeyGen Custom Avatar
=
Digital Joseph

Voice cloning produces audio in Joseph's voice ($0.014/run). The avatar engine adds his face with accurate lip-sync ($29/mo). Together they produce a talking-head video indistinguishable from a real recording for short-form content.

Integrated Recommendation

Voice: F5-TTS via Replicate -- $0.014/run, already tested, API token active.
Avatar: HeyGen Creator plan -- $29/month, custom digital twin from BombBomb videos, pipeline scaffold exists.
Total: ~$34-44/month for unlimited "Digital Joseph" video content.

Use Cases

Primary newsletter Video

60-90 second market update embedded in monthly email. ~$1.50/video

Listing Walkthrough Intro

30-60 second property introduction per listing. ~$0.70/video

Transaction Follow-Up

15-30 second personalized congratulations at close. ~$0.35/video

Social Media Clips

15-30 second IG Reels / LinkedIn / FB clips. ~$0.35/video

Referral brief

30-60 second video intro for referral partners. ~$0.70/video

Market Update Teaser

30-60 second monthly market insight clip. ~$1.00/video

What Needs Joseph's Approval

HeyGen Creator plan -- $29/month. This is the production avatar engine. Cannot proceed to production quality without it.

Today (free): We can run a SadTalker prototype via Replicate with existing headshot + voice clone. Lower quality but proves the concept at ~$0.20.

Integrated Voice + Avatar Pipeline

How the two flows chain together:

Written Script
F5-TTS (Voice Clone)
WAV Audio
HeyGen Avatar
MP4 Video

Voice Digitization Flow produces the audio. This Avatar Flow adds the face. Both feed the Content pipeline.

Step 1: Script Generation

Input: Blog post, newsletter content, listing data, or follow-up template

Output: Spoken script matched to Joseph's voice style (warm, direct, "Hey y'all" openers)

Tool: voice_avatar_pipeline.py --script-from-blog (already functional)

Step 2: Voice Cloning (from Voice Digitization Flow)

Input: Script text + 60-second reference audio of Joseph

Output: WAV audio file -- Joseph's cloned voice speaking the script

Engine: F5-TTS via Replicate Active $0.014/run

Already tested: jrd-f5tts-test.wav (13.3 seconds) exists from prior run

Step 3: Avatar Video Generation

Input: WAV audio + custom avatar (trained from BombBomb video)

Output: MP4 video -- Joseph's face lip-synced to cloned voice, 1080p

Production engine: HeyGen Avatar V Needs API Key ($29/mo)

Prototype engine: SadTalker via Replicate Active ($0.20/run)

Step 4: Review Gate

Criteria: Face looks natural (7+/10), lip-sync matches audio (7+/10), overall "Is this me?" (7+/10)

Gate: Joseph watches and approves before any publish. No exceptions.

Step 5: Publish

Outlets: Email embed, social media upload, listing page embed, direct message

Integration: Content pipeline MICRO layer handles finishing: media review, content tracker, publish, performance tracking.

Platform Comparison

PlatformQualityCost/moBest ForStatus
HeyGen9/10$29Production content at scaleNeed key
D-ID7/10$16-48Quick photo-based clipsNeed key
Synthesia8/10$29-89Training / onboarding videosNeed key
SadTalker (Replicate)6/10~$5 usagePrototyping from photoReady
Video-ReTalking (Replicate)7/10~$10 usageRe-dub existing BB videosReady
Wav2Lip (Replicate)7/10~$5 usageLip-sync replacementReady

Why HeyGen Wins for TPJG

1. Best-in-class custom avatar -- Avatar V creates the most realistic digital twin from uploaded video. With 3,801 BombBomb videos as source material, the training data is world-class.

2. Pipeline scaffold exists -- voice_avatar_pipeline.py already has the full HeyGen API integration coded (audio upload, video generation, polling, download). Just needs the API key.

3. 30 min/month covers all use cases -- Newsletter (1.5 min) + listings (5 min) + follow-ups (2 min) + social (4 min) + referral (1 min) = ~13.5 min. Headroom for growth.

4. Audio input mode -- HeyGen accepts our F5-TTS cloned voice as audio input, giving us full control over voice quality rather than relying on HeyGen's own TTS.

First 3 Actions

Source Assets (Verified)

AssetCountLocation
BombBomb face videos3,801 (3,798 with H264 URLs)memories/knowledge/bombbomb-videos/*.json
Headshot (padded)1 JPEG (380KB)data/voice-samples/jrd-headshot-padded.jpg
Voice reference60s WAVdata/voice-samples/jrd-voice-sample-60s.wav
F5-TTS test output13.3s WAVdata/voice-samples/jrd-f5tts-test.wav
Pipeline scaffoldHeyGen integration (lines 230-347)tools/voice_avatar_pipeline.py

API Keys Status

KeyStatusNotes
REPLICATE_API_TOKENActiveIn .env, tested, service-account account
HEYGEN_API_KEYMissingNeeds Creator plan signup ($29/mo)
HEYGEN_AVATAR_IDMissingCreated after uploading training video to HeyGen
ELEVENLABS_API_KEYMissingFuture upgrade path, not needed now
D_ID_API_KEYMissingOptional, not recommended as primary
SYNTHESIA_API_KEYMissingOptional, not recommended for our use case

Replicate Models (Available Now)

ModelRunsCost/RunInputUse Case
cjwbw/sadtalker172,523~$0.10-0.30Photo + audioAnimate photo into talking head
chenxwh/video-retalking33,237~$0.40Video + audioRe-dub existing video with new audio
devxpy/cog-wav2lip3,659,285~$0.05-0.15Video + audioReplace lips only in existing video
lucataco/f5-tts--~$0.014Text + ref audioVoice clone (companion flow)

HeyGen API Integration (Scaffold)

File: tools/voice_avatar_pipeline.py, lines 230-347

Function: generate_avatar_from_audio(audio_path, output_path)

Flow: Upload audio asset -> Create video generation task (avatar_id + audio_asset_id) -> Poll for completion (max 5 min) -> Download MP4

Endpoint: https://api.heygen.com/v2/video/generate

Activation: Set HEYGEN_API_KEY and HEYGEN_AVATAR_ID in .env. The scaffold handles everything else.

HeyGen Credit Math

Creator plan: 600 credits/month at $29

Avatar V: 20 credits/minute of video

Capacity: 600 / 20 = 30 minutes of Avatar V video per month

Estimated usage: ~13.5 min/month across all use cases. Headroom: 16.5 min unused.

Upgrade trigger: If usage exceeds 25 min/month consistently, upgrade to Business ($149/mo, 1,500 credits = 75 min).

Open-Source Alternatives (No GPU Path)

All run on Replicate's hosted GPUs using our existing API token. No local GPU required.

SadTalker: Best for photo-to-video. Single image + audio. Head motion generated. Quality 6/10 -- artifacts on longer clips but acceptable for prototyping.

Video-ReTalking: Best for re-dubbing. Three-stage pipeline: normalize expressions, sync lips, enhance face. Takes existing BB video + new audio. Quality 7/10.

Wav2Lip: Most popular (3.6M runs). Only changes lip region. Minimal artifacts but can look "pasted." Quality 7/10 for lip accuracy.

Hedra / EMO / LivePortrait: Not practical. Hedra has limited API. EMO is research-only. LivePortrait needs local GPU.

Monthly Cost Projection

ComponentMonthly CostWhat You Get
HeyGen Creator$29.0030 min avatar video, custom digital twin, 1080p
F5-TTS (Replicate)~$5-15Unlimited voice cloning at $0.014/run
Total~$34-44"Digital Joseph" at scale
📚Library