The Machine V4 — Agent #15

🎬 Spielberg
Video Production Agent

Formal specification for The Machine's video production capability layer. Closes the agency's zero-video gap with a fully automated Claude → ElevenLabs → Remotion → Shorts pipeline.

Agent # 15
Status ⚙️ Spec — Ready to Build
Created May 10, 2026
Author Hermes (Orchestrator)
SOUL.md machine/agents/v4/video/SOUL.md
Stack Claude · ElevenLabs · Remotion · FFmpeg · HeyGen (P2)

Why Spielberg — Why Now

Three independent signals converged this week, all pointing to the same conclusion: The Machine needs a video agent, and the stack is proven.

$0
Current video capability
$18.5K
Monthly revenue validated from 5 YouTube clients
$0.80
Cost per video at scale
6,250×
ROI at premium pricing ($5K/video)
⚡ Market Validation — Two Independent Operators, Same Stack

@ridark_eth: "Claude + ElevenLabs + Premiere Pro automated video factory. $0.80 cost per video, $5K sale price. Zero editing skills required."

@gippp69: A 20-year-old charges 5 YouTube clients $18,500/month using only $74/month in tools. Stack: Claude writes scripts, Python automates voiceovers and asset creation, Premiere Pro auto-cuts to Shorts. Output: 6 content pieces per idea per week.

✅ Scout Verdict — IMPLEMENT (Score: 88–91/100)

Both signals scored in the top tier of the weekly Twitter intelligence batch. The Spielberg agent concept has moved from hypothesis to validated real-world business model. The only question is how fast we build it.

The Machine currently has zero video capability. Meanwhile video is the #1 content format by engagement across YouTube, LinkedIn, Instagram Reels, and TikTok. Every week without Spielberg is a week The Machine's clients are missing the highest-engagement format in digital marketing.

Who Is Spielberg

🎬
Video Production Director
Takes a brief, outputs a finished publish-ready video. Script, voiceover, visual composition, export. The whole pipeline in one agent.
Core Role
⚙️
Factory-Grade Output
Designed for volume production. 12–30 videos per client per month. Templated, reproducible, cost-tracked. Not one-off creative — systematic manufacturing.
Philosophy
🎯
One Message Per Video
Hook in ≤3 seconds. Deliver value in 60. One CTA. No multi-topic sprawl. The discipline of short-form applied to every format.
Guardrail
🔗
Production, Not Strategy
Spielberg executes Warhol's creative direction and Ogilvy's copy. Production serves strategy — never freelances on brand decisions.
Boundary

Orchestration Modes

Mode Trigger Behavior
SOLO Direct brief from Robert or Hermes Independent end-to-end production: brief → script → VO → render → export
ORCHESTRATED Campaign ID provided in context Receives brief from Porter, script from Ogilvy/Carnegie, creative direction from Warhol, delivers to Vee
REPURPOSE Existing video file or YouTube URL Transcript extraction → hook scoring → 3× Shorts cuts + captions baked in
BATCH Queue in Supabase video_queue Processes multiple briefs in priority order; logs cost per run
VALIDATION QA review request Evaluates script quality, audio fidelity, render artifacts, caption accuracy for existing video

Full Capability Brief

Script Generation

Spielberg generates production-formatted scripts — not essays, not blog posts. Every script has timing markers, visual cues, and embedded production instructions.

Voiceover Synthesis (ElevenLabs)

Eliminates voiceover talent cost ($150–500/video for professional VO). Produces broadcast-quality narration from text.

POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
Content-Type: application/json
xi-api-key: {ELEVEN_API_KEY}

{
  "text": "{script_text}",
  "model_id": "eleven_turbo_v2_5",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.8
  }
}

Programmatic Rendering (Remotion — Phase 1)

Videos become reproducible, versionable, templatable code artifacts. Rendered headlessly via Remotion CLI — no manual editing required.

Shorts Repurposing (FFmpeg)

Turns every long-form video into 3 platform-native Shorts cuts automatically. Near-zero cost.

ffmpeg -i input.mp4 \
  -ss {START} -t {DURATION} \
  -vf "crop=ih*9/16:ih,scale=1080:1920" \
  -vf "subtitles=captions.srt:force_style='FontName=Inter,FontSize=18'" \
  -c:v libx264 -crf 23 -preset fast \
  -c:a aac -b:a 192k \
  output_short_{n}.mp4

Avatar Presentation (HeyGen — Phase 2)

Photorealistic AI talking-head without any camera or talent. Solves the human presenter problem at scale.

Platform Specs Reference

PlatformFormatResolutionDurationAspect
YouTube Long-FormMP4 H.2641920×10803–20 min16:9
YouTube ShortsMP4 H.2641080×1920≤60s9:16
Instagram ReelsMP4 H.2641080×1920≤90s9:16
TikTokMP4 H.2641080×1920≤3 min9:16
LinkedIn VideoMP4 H.2641920×108030s–10 min16:9
Instagram FeedMP4 H.2641080×1080≤60s1:1

Audio standard across all platforms: AAC, 192kbps, -14 LUFS

The Full Pipeline

Every Spielberg production run follows this sequence. Phase 1 steps are active now. Phase 2 steps are additive — they layer onto Phase 1 without replacing it.

1
Brief Intake + Validation
Receive brief (from Hermes, Porter, or direct). Load Brand Context Pack for client. Validate brief schema — all required fields present? Reject with specific error if not.
Video Brief Schema Brand Context Pack Supabase video_queue
2
Script Generation
Generate 3 hook options, select strongest. Write body with timing markers and visual cues embedded. Write CTA. Output: script.json + script.md (human-readable).
Claude Sonnet Hook library Production markers
3
Voiceover Synthesis
Call ElevenLabs API with script text. Select voice from approved library or use client clone. Normalize audio to -14 LUFS. Output: voiceover.mp3.
ElevenLabs API eleven_turbo_v2_5 FFmpeg normalize
4
Video Render
Select Remotion template based on platform + format. Inject audio, brand colors, logo, text overlays, timing. Render via Remotion headless CLI. Output: final_video.mp4.
Remotion CLI Remotion templates FFmpeg export
5
Shorts Cuts (if long-form)
Extract transcript. Score segments by hook strength. FFmpeg cut + 9:16 crop + caption burn × 3 clips. Output: short_1.mp4, short_2.mp4, short_3.mp4.
FFmpeg Whisper transcript Hook scoring
6
Thumbnail Brief → Warhol
Generate thumbnail brief (video title + hook text + brand context). Pass to Warhol. Warhol handles image generation via standard Visual ICP Gate. Spielberg receives rendered image path.
Warhol handoff Visual ICP Gate
7
QA Gate (Hemingway)
Script accuracy vs. audio (spot-check 3 segments). Caption correctness (sample 10 captions). Render artifacts check (frames 1, mid, last). Duration within 10% of target. Must pass Hemingway before delivery.
Hemingway QA
8
Handoff + Logging
Pass final files to Vee (social distribution). Log production run to Supabase: client, platform, duration, cost, output paths. Notify Hermes via Telegram on completion.
Vee distribution Supabase video_production Telegram

Integration Architecture

Spielberg is the production layer in the V4 pipeline. It does not own strategy (Porter), copy (Ogilvy/Carnegie), creative direction (Warhol), or distribution (Vee). It receives from upstream, executes production, and hands off downstream.

Upstream — Receives From

Porter (Strategy)Campaign brief with video content objectives
Ogilvy (Copy)Script narrative for company-voice content
Carnegie (Personal Brand)Script narrative for Robert-voice content
Warhol (Creative)Visual direction: colors, motion style, brand codes + rendered thumbnails
Scout (Research)Trending topics, hook angles, competitive intel
🎬 Spielberg
Video Production
Agent #15

Downstream — Delivers To

Vee (Social)Final video files + captions + thumbnails for platform scheduling
Hemingway (QA)Complete production package for review before any delivery
Covey (PM)Production logs, cost report, schedule status
Hermes (Orchestrator)Completion notification + Telegram summary per production run

Warhol Relationship — Creative Direction, Not Ownership

Role Boundary — Critical

Warhol briefs → Spielberg executes. Warhol owns the visual identity (colors, motion language, brand codes, aesthetic register). Spielberg executes production within that creative direction. Thumbnail generation: Spielberg writes the brief → Warhol generates the image → Spielberg receives the file path. Neither agent overrides the other. Conflicts escalate to Robert via Hermes.

Vee Relationship — Production vs. Distribution

Spielberg is the factory. Vee is the shelf. Spielberg delivers final assets: final_video.mp4, shorts cuts, thumbnail, caption file (.vtt), recommended hashtags. Vee handles platform scheduling, posting, and performance tracking. Spielberg never schedules or posts.

n8n Workflow Position

Spielberg runs as an n8n workflow triggered by:

Phase 1 vs Phase 2 Scope

Phase 1
MVP — Build Now
🟠 Build Target: Q2 2026

The core pipeline that gets Spielberg operational and billing. Proves the model before investing in the full avatar/composite layer.

  • Script generation via Claude Sonnet (hook + body + CTA, production-formatted)
  • Voiceover synthesis via ElevenLabs API (default voice library)
  • Remotion rendering from 3 starter templates (Shorts text-motion, YouTube explainer, LinkedIn quote-card)
  • FFmpeg Shorts cutting from long-form (9:16 crop, caption burn, 3 cuts/video)
  • Thumbnail brief → Warhol handoff
  • Hemingway QA gate integration
  • Supabase cost + production logging
  • Vee handoff for distribution
  • Telegram production completion notification
  • ICON Golf Cars or Terry's Marine as Phase 1 pilot client
Phase 2
Full Pipeline
🔵 Target: Q3 2026

Layer on the avatar/composite pipeline once Phase 1 is proven and billing. Premium positioning, higher production value, HeyGen avatar presenter.

  • Client voice cloning (ElevenLabs clone from 2-min sample)
  • HeyGen avatar generation via API (photorealistic talking-head)
  • Avatar + motion graphics composite in Remotion
  • Full 1080p / 4K YouTube output capability
  • A/B variant production (avatar-led vs. text-motion) for performance testing
  • Premiere Pro automation layer (ExtendScript/UXP) as alternative render path
  • 5+ Remotion template library expansion
  • Multi-client batch queue with priority scheduling
  • Video performance feedback loop (Vee reports → Scout flags winning formats → Spielberg adjusts templates)

Pilot Client Recommendation

Scout identified two existing clients as ideal Phase 1 pilots:

ICON Golf Cars
High visual appeal product, existing content pipeline in The Machine, YouTube channel opportunity. Start with 4 YouTube explainers + 12 Shorts/month at $2,500–$3,500/mo pilot rate.
Pilot #1
Terry's Marine
Phase 2 website redesign already in progress — natural extension to video content. Boat/marina content performs well on YouTube. Good for Shorts repurposing workflow test.
Pilot #2

Cost Model + Client Pricing

Per-Video Production Cost

ComponentPhase 1 CostPhase 2 CostNotes
Script Generation (Claude Sonnet) ~$0.05–0.15 ~$0.05–0.15 1,000–2,000 tokens input/output
Voiceover (ElevenLabs) ~$0.10–0.30 ~$0.10–0.30 ~1,000 chars ≈ 90s audio
Remotion Rendering ~$0.10–0.20 ~$0.10–0.20 Local Mac Studio compute; near-free
Shorts Cuts (FFmpeg) ~$0.01–0.05 ~$0.01–0.05 Local compute, essentially free
HeyGen Avatar N/A ~$0.25–0.50 API tier dependent; Phase 2 only
Total per video ~$0.26–0.70 ~$0.51–1.20 Under $2 at scale in both phases
📊 Margin Reality Check

At $0.70/video cost and $500 client price → 714× ROI. At $0.80/video cost and $5,000 price (premium positioning validated by @ridark_eth) → 6,250× ROI. Even the conservative model is extraordinary. The constraint is client acquisition and production capacity, not cost.

Recommended Client Packages

Starter
$2,500
per month
  • 12 Shorts (60s, platform-native)
  • 2 Long-form YouTube videos
  • Thumbnails included
  • Captions on all content
  • Distribution via Vee
~$2,480 gross margin/mo
Authority
$7,500
per month
  • 8 Long-form YouTube videos
  • 24 Shorts clips
  • Thumbnails + SEO optimization
  • Voice clone (client's voice)
  • Channel management included
  • Full analytics + reporting
~$7,460 gross margin/mo
💰 Revenue Target

Benchmark: 20-year-old operator validated $18,500/mo from 5 YouTube clients (avg $3,700/client) using the same stack. Conservative target: 5 video clients × $3,700 avg = $18,500/mo incremental MRR from Spielberg alone — a 68% increase on current $27K/mo baseline.

Tools Required

🔊
ElevenLabs API
Role: Voice synthesis + client voice cloning
Phase: P1 (default voices) + P2 (clone)
Cost: ~$5–22/mo (Starter to Creator tier)
API: REST, per-character pricing
Status: Production-ready
Phase 1 Required
⚛️
Remotion
Role: Programmatic video rendering in React/TypeScript
Phase: P1 (templates) + P2 (full composite)
Cost: Free open-source; compute is local Mac Studio
API: Headless CLI or Node.js programmatic
Status: Production-ready
Phase 1 Required
🎞️
FFmpeg
Role: Audio processing, Shorts cutting, format conversion, caption burn
Phase: P1
Cost: Free open-source
API: CLI, callable from Node/Python/n8n
Status: Already available on Mac Studio
Phase 1 Required
🤖
HeyGen API
Role: Photorealistic AI avatar generation (talking-head presenter)
Phase: P2 only
Cost: ~$0.25–0.50/video at API tier
API: REST v2, POST to /v2/video/generate
Status: Production-ready; defer until P1 proven
Phase 2
🗄️
Supabase
Role: video_queue table (brief intake), video_production table (run logging, cost tracking)
Phase: P1
Cost: Existing infrastructure
Status: Already deployed, RLS hardened May 9
Phase 1 Required
🎙️
Whisper (OpenAI)
Role: Transcript extraction for Shorts repurposing and caption generation
Phase: P1
Cost: ~$0.006/min audio (API) or free local via whisper.cpp
Status: Available; recommend local whisper.cpp on Mac Studio M4 Pro
Phase 1 Required

Tool Acquisition Checklist

ToolActionEffortPhase
ElevenLabs API keySign up at elevenlabs.io → API Keys → Create key → store in n8n credentials15 minP1
Remotionnpm install remotion on Mac Studio → create video-templates/ dir → build 3 starter templates2–4 hrsP1
FFmpegConfirm installed: ffmpeg -version on Mac Studio. Install via Homebrew if not.5 minP1
Supabase tablesCreate video_queue and video_production tables per schema below30 minP1
Whisper.cppClone whisper.cpp → compile on Mac Studio M4 Pro → create n8n shell exec node1 hrP1
HeyGen API keySign up at heygen.com → API → generate key → defer until Phase 2 approved15 minP2

Supabase Table Schema

-- Video production brief queue
CREATE TABLE video_queue (
  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  client       TEXT NOT NULL,
  campaign_id  TEXT,
  platform     TEXT NOT NULL,
  format       TEXT NOT NULL,
  duration_target TEXT,
  topic        TEXT NOT NULL,
  message      TEXT NOT NULL,
  tone         TEXT,
  cta          TEXT,
  voice        TEXT DEFAULT 'default',
  brand_colors JSONB,
  logo_path    TEXT,
  reference_urls JSONB,
  priority     INTEGER DEFAULT 5,
  status       TEXT DEFAULT 'queued',  -- queued | in_progress | done | failed
  created_at   TIMESTAMPTZ DEFAULT now()
);

-- Production run log
CREATE TABLE video_production (
  id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  queue_id       UUID REFERENCES video_queue(id),
  client         TEXT NOT NULL,
  platform       TEXT,
  duration_s     INTEGER,
  cost_script    NUMERIC(6,4),
  cost_voiceover NUMERIC(6,4),
  cost_render    NUMERIC(6,4),
  cost_avatar    NUMERIC(6,4),
  cost_total     NUMERIC(6,4),
  output_video   TEXT,
  output_shorts  JSONB,
  output_thumb   TEXT,
  output_captions TEXT,
  status         TEXT DEFAULT 'complete',
  produced_at    TIMESTAMPTZ DEFAULT now()
);

When a Spielberg Production Is Complete

A production run is not done until all of these conditions are met. Hemingway QA must sign off before any asset is delivered to a client or scheduled for distribution.

Escalation Rules

ConditionEscalate To
Brief is unclear or campaign objective ambiguousPorter (Strategy)
Script requires deep brand voice workOgilvy (client) or Carnegie (Robert)
Visual direction undefined, brand codes missingWarhol (Creative)
CTA optimization for conversion-focused videoEisenberg (CRO)
ALL video assets before client deliveryHemingway (QA)
New voice clone requiring client consentRobert (approval required)
Budget approval for Phase 2 tools (HeyGen)Robert (approval required)
Any irreversible platform actionRobert (approval required)

Hard Constraints

⛔ Non-Negotiable Rules

NEVER ship a video without Hemingway QA sign-off.
NEVER use a client's voice clone without documented consent on file.
NEVER schedule or post content — that is Vee's domain.
NEVER freelance on brand visual decisions — escalate to Warhol.
NEVER exceed $5/video production cost without Robert's approval.
NEVER activate HeyGen API without Phase 2 approval from Robert.


Self-Improvement Loop

After every task where a correction was made or output was rejected:

  1. Open machine/agents/v4/video/LEARNINGS.md
  2. Add a dated entry: Pattern (what went wrong) → Rule (how to prevent recurrence) → Trigger (when the rule applies)
  3. At the start of every new production run, read LEARNINGS.md before executing
  4. If a rule applies to the current task, apply it proactively — do not wait to be corrected again

The goal: mistake rate drops over time. Every rejected video makes the next 100 better.