Don't Get Left Behind
Posts
Advanced Workflow: Baby Face Deep Fake AI Video

Advanced Workflow: Baby Face Deep Fake AI Video

A Step-by-Step Guide to Full AI Deepfake Video Creation... but for babies

Tom Babb
May 16, 2025

ICYMI: I'm breaking down how I made this video below which was an announcement for my company, SkyFi.

This is the more technical breakdown of how I created a talking baby video of Bill Perkins – without his involvement, using only public content and AI tools. Note, I work for Bill. I don't recommend doing this for strangers.

The process has three main components:

Voice replication
Scriptwriting in Bill’s tone
Face generation and animation

Step 1: Source a High-Quality Voice Sample

I searched YouTube for a high-production interview with Bill Perkins that was at least 5 minutes long.
I selected a clip from his appearance on The Pomp Podcast, which had clean audio and good delivery.
I used a YouTube to MP3 converter to download a 3–5 minute audio segment from the interview. This would be used later to train the voice clone.

Step 2: Analyze Voice and Tone from Transcript

I took the same YouTube link and used a transcription tool to get a clean text transcript of Bill’s segment.
I uploaded the transcript to ChatGPT 4o and asked:
Prompt: “Please give me a detailed voice and tone breakdown of Bill Perkins based on this transcript.”
The output gave me a detailed summary of his speaking style, personality, phrasing, and delivery cadence. That output is your prompt. Literally copy and paste.

Step 3: Scriptwriting in Bill’s Voice

I uploaded that tone description into ChatGPT along with a real announcement I wanted him to comment on (in this case, a DIU government contract we secured).
I prompted GPT to write:
Prompt: “Write three short scripts reacting to this announcement in the voice and tone of Bill Perkins. [insert voice and tone description]”
I selected the one that best fit the video I wanted to create. I always ask for three or more options.

Step 4: Train a Voice Clone in ElevenLabs

I went to ElevenLabs and created a new custom voice.
- Voices -> + -> Instant Voice Clone
I uploaded at least three minutes of the MP3 clip from earlier to train the model.
- Tip: ElevenLabs needs clear, clean audio of the same speaker for best results.
Once trained, I pasted the AI-generated script into ElevenLabs and generated Bill’s voice using the cloned model.
- You can adjust pitch, inflection, and style to get it closer to how he naturally speaks.

Step 5: Generate a Babyface Image of Bill Perkins

I opened GPT-4o, uploaded multiple images of Bill Perkins from different angles (via Google Images).
- Tip: Uploading 2–3 well-lit, forward-facing images improves accuracy.
I clicked “Create Image” and prompted:
Prompt: “Make an ultra-realistic baby version of this person.”
- If you don't see "Create Image” as an option when you prompt, you need to upgrade to GPT Plus ($20/month).
I refined the result with direct feedback like:
Prompt:
“Make the baby smile more”
“Add more hair”
“Reduce puffiness in the cheeks”
I iterated until the image resembled a baby version of Bill that felt recognizable. For the video, I even used a SkyFi hat that he often wears.

Step 6: Animate the Face with HeyGen

I logged into HeyGen and selected the Photo-to-Video AI (IV) tool.
Workflow inside HeyGen:
1. Upload the baby image on the left panel
2. Upload the MP3 audio (Bill’s AI-cloned voice) on the right
3. Click “Generate Video”
Within minutes, I had a fully animated deepfake baby video of our cofounder speaking with his tone, his voice, and my chosen script. He approved.

Optional: Post-Production Polish

To finalize the video:

I imported the rendered clip into Premiere Pro.
I added:
- Captions
- A branded logo
- A headline overlay referencing the DIU news
This part is optional, but highly recommended for polish if you’re publishing to social platforms or embedding in newsletters.

Tools Used:

Tool	Purpose
YouTube > MP3	Extract public audio
Transcription Tool	Get accurate transcript
ChatGPT	Analyze voice & write script
ElevenLabs	Voice cloning & synthesis
GPT-4o	Image generation
HeyGen	Lip-sync video creation
Premiere Pro (optional)	Final video edits

This process can be adapted for any public figure with enough audio and visual reference material. It’s a full-stack demonstration of what AI can do when applied precisely.

Yes, it’s a deepfake. Yes, it’s that good.

Let me know if you want a downloadable version, workflow diagram, or a template prompt pack for automating these parts. Here's my LinkedIn.