- Don't Get Left Behind
- Posts
- Advanced Workflow: Baby Face Deep Fake AI Video
Advanced Workflow: Baby Face Deep Fake AI Video
A Step-by-Step Guide to Full AI Deepfake Video Creation... but for babies
ICYMI: I'm breaking down how I made this video below which was an announcement for my company, SkyFi.
This is the more technical breakdown of how I created a talking baby video of Bill Perkins – without his involvement, using only public content and AI tools. Note, I work for Bill. I don't recommend doing this for strangers.
The process has three main components:
Voice replication
Scriptwriting in Bill’s tone
Face generation and animation
Step 1: Source a High-Quality Voice Sample
I searched YouTube for a high-production interview with Bill Perkins that was at least 5 minutes long.
I selected a clip from his appearance on The Pomp Podcast, which had clean audio and good delivery.
I used a YouTube to MP3 converter to download a 3–5 minute audio segment from the interview. This would be used later to train the voice clone.
Step 2: Analyze Voice and Tone from Transcript
I took the same YouTube link and used a transcription tool to get a clean text transcript of Bill’s segment.
I uploaded the transcript to ChatGPT 4o and asked:
Prompt: “Please give me a detailed voice and tone breakdown of Bill Perkins based on this transcript.”
The output gave me a detailed summary of his speaking style, personality, phrasing, and delivery cadence. That output is your prompt. Literally copy and paste.
Step 3: Scriptwriting in Bill’s Voice
I uploaded that tone description into ChatGPT along with a real announcement I wanted him to comment on (in this case, a DIU government contract we secured).
I prompted GPT to write:
Prompt: “Write three short scripts reacting to this announcement in the voice and tone of Bill Perkins. [insert voice and tone description]”
I selected the one that best fit the video I wanted to create. I always ask for three or more options.
Step 4: Train a Voice Clone in ElevenLabs
I went to ElevenLabs and created a new custom voice.
Voices -> + -> Instant Voice Clone
I uploaded at least three minutes of the MP3 clip from earlier to train the model.
Tip: ElevenLabs needs clear, clean audio of the same speaker for best results.
Once trained, I pasted the AI-generated script into ElevenLabs and generated Bill’s voice using the cloned model.
You can adjust pitch, inflection, and style to get it closer to how he naturally speaks.
Step 5: Generate a Babyface Image of Bill Perkins
I opened GPT-4o, uploaded multiple images of Bill Perkins from different angles (via Google Images).
Tip: Uploading 2–3 well-lit, forward-facing images improves accuracy.
I clicked “Create Image” and prompted:
Prompt: “Make an ultra-realistic baby version of this person.”
If you don't see "Create Image” as an option when you prompt, you need to upgrade to GPT Plus ($20/month).
I refined the result with direct feedback like:
Prompt:
“Make the baby smile more”
“Add more hair”
“Reduce puffiness in the cheeks”I iterated until the image resembled a baby version of Bill that felt recognizable. For the video, I even used a SkyFi hat that he often wears.
Step 6: Animate the Face with HeyGen
I logged into HeyGen and selected the Photo-to-Video AI (IV) tool.
Workflow inside HeyGen:
Upload the baby image on the left panel
Upload the MP3 audio (Bill’s AI-cloned voice) on the right
Click “Generate Video”
Within minutes, I had a fully animated deepfake baby video of our cofounder speaking with his tone, his voice, and my chosen script. He approved.
Optional: Post-Production Polish
To finalize the video:
I imported the rendered clip into Premiere Pro.
I added:
Captions
A branded logo
A headline overlay referencing the DIU news
This part is optional, but highly recommended for polish if you’re publishing to social platforms or embedding in newsletters.
Tools Used:
Tool | Purpose |
|---|---|
Extract public audio | |
Get accurate transcript | |
ChatGPT | Analyze voice & write script |
Voice cloning & synthesis | |
GPT-4o | Image generation |
Lip-sync video creation | |
Premiere Pro (optional) | Final video edits |
This process can be adapted for any public figure with enough audio and visual reference material. It’s a full-stack demonstration of what AI can do when applied precisely.
Yes, it’s a deepfake. Yes, it’s that good.
Let me know if you want a downloadable version, workflow diagram, or a template prompt pack for automating these parts. Here's my LinkedIn.