Advanced Workflow: Baby Face Deep Fake AI Video

A Step-by-Step Guide to Full AI Deepfake Video Creation... but for babies

ICYMI: I'm breaking down how I made this video below which was an announcement for my company, SkyFi.

This is the more technical breakdown of how I created a talking baby video of Bill Perkins – without his involvement, using only public content and AI tools. Note, I work for Bill. I don't recommend doing this for strangers.

The process has three main components:

  1. Voice replication

  2. Scriptwriting in Bill’s tone

  3. Face generation and animation

Step 1: Source a High-Quality Voice Sample

  • I searched YouTube for a high-production interview with Bill Perkins that was at least 5 minutes long.

  • I selected a clip from his appearance on The Pomp Podcast, which had clean audio and good delivery.

  • I used a YouTube to MP3 converter to download a 3–5 minute audio segment from the interview. This would be used later to train the voice clone.

Step 2: Analyze Voice and Tone from Transcript

  • I took the same YouTube link and used a transcription tool to get a clean text transcript of Bill’s segment.

  • I uploaded the transcript to ChatGPT 4o and asked:

    Prompt: “Please give me a detailed voice and tone breakdown of Bill Perkins based on this transcript.”

  • The output gave me a detailed summary of his speaking style, personality, phrasing, and delivery cadence. That output is your prompt. Literally copy and paste.

Step 3: Scriptwriting in Bill’s Voice

  • I uploaded that tone description into ChatGPT along with a real announcement I wanted him to comment on (in this case, a DIU government contract we secured).

  • I prompted GPT to write:

    Prompt: “Write three short scripts reacting to this announcement in the voice and tone of Bill Perkins. [insert voice and tone description]”

  • I selected the one that best fit the video I wanted to create. I always ask for three or more options.

Step 4: Train a Voice Clone in ElevenLabs

  • I went to ElevenLabs and created a new custom voice.

    • Voices -> + -> Instant Voice Clone

  • I uploaded at least three minutes of the MP3 clip from earlier to train the model.

    • Tip: ElevenLabs needs clear, clean audio of the same speaker for best results.

  • Once trained, I pasted the AI-generated script into ElevenLabs and generated Bill’s voice using the cloned model.

    • You can adjust pitch, inflection, and style to get it closer to how he naturally speaks.

Step 5: Generate a Babyface Image of Bill Perkins

  • I opened GPT-4o, uploaded multiple images of Bill Perkins from different angles (via Google Images).

    • Tip: Uploading 2–3 well-lit, forward-facing images improves accuracy.

  • I clicked “Create Image” and prompted:

    Prompt: “Make an ultra-realistic baby version of this person.”

    • If you don't see "Create Image” as an option when you prompt, you need to upgrade to GPT Plus ($20/month).

  • I refined the result with direct feedback like:

    Prompt:

    “Make the baby smile more”
    “Add more hair”
    “Reduce puffiness in the cheeks”

  • I iterated until the image resembled a baby version of Bill that felt recognizable. For the video, I even used a SkyFi hat that he often wears.

Step 6: Animate the Face with HeyGen

  • I logged into HeyGen and selected the Photo-to-Video AI (IV) tool.

  • Workflow inside HeyGen:

    1. Upload the baby image on the left panel

    2. Upload the MP3 audio (Bill’s AI-cloned voice) on the right

    3. Click “Generate Video”

  • Within minutes, I had a fully animated deepfake baby video of our cofounder speaking with his tone, his voice, and my chosen script. He approved.

Optional: Post-Production Polish

To finalize the video:

  • I imported the rendered clip into Premiere Pro.

  • I added:

    • Captions

    • A branded logo

    • A headline overlay referencing the DIU news

  • This part is optional, but highly recommended for polish if you’re publishing to social platforms or embedding in newsletters.

Tools Used:

Tool

Purpose

YouTube > MP3

Extract public audio

Transcription Tool

Get accurate transcript

ChatGPT

Analyze voice & write script

ElevenLabs

Voice cloning & synthesis

GPT-4o

Image generation

HeyGen

Lip-sync video creation

Premiere Pro (optional)

Final video edits

This process can be adapted for any public figure with enough audio and visual reference material. It’s a full-stack demonstration of what AI can do when applied precisely.

Yes, it’s a deepfake. Yes, it’s that good.

Let me know if you want a downloadable version, workflow diagram, or a template prompt pack for automating these parts. Here's my LinkedIn.