Spielberg — Agent #15 Spec

Signal Intelligence

Why Spielberg — Why Now

Three independent signals converged this week, all pointing to the same conclusion: The Machine needs a video agent, and the stack is proven.

$0

Current video capability

$18.5K

Monthly revenue validated from 5 YouTube clients

$0.80

Cost per video at scale

6,250×

ROI at premium pricing ($5K/video)

⚡ Market Validation — Two Independent Operators, Same Stack

@ridark_eth: "Claude + ElevenLabs + Premiere Pro automated video factory. $0.80 cost per video, $5K sale price. Zero editing skills required."

@gippp69: A 20-year-old charges 5 YouTube clients $18,500/month using only $74/month in tools. Stack: Claude writes scripts, Python automates voiceovers and asset creation, Premiere Pro auto-cuts to Shorts. Output: 6 content pieces per idea per week.

✅ Scout Verdict — IMPLEMENT (Score: 88–91/100)

Both signals scored in the top tier of the weekly Twitter intelligence batch. The Spielberg agent concept has moved from hypothesis to validated real-world business model. The only question is how fast we build it.

The Machine currently has zero video capability. Meanwhile video is the #1 content format by engagement across YouTube, LinkedIn, Instagram Reels, and TikTok. Every week without Spielberg is a week The Machine's clients are missing the highest-engagement format in digital marketing.

Agent Identity

Who Is Spielberg

🎬

Video Production Director

Takes a brief, outputs a finished publish-ready video. Script, voiceover, visual composition, export. The whole pipeline in one agent.

Core Role

⚙️

Factory-Grade Output

Designed for volume production. 12–30 videos per client per month. Templated, reproducible, cost-tracked. Not one-off creative — systematic manufacturing.

Philosophy

🎯

One Message Per Video

Hook in ≤3 seconds. Deliver value in 60. One CTA. No multi-topic sprawl. The discipline of short-form applied to every format.

Guardrail

🔗

Production, Not Strategy

Spielberg executes Warhol's creative direction and Ogilvy's copy. Production serves strategy — never freelances on brand decisions.

Boundary

Orchestration Modes

Mode	Trigger	Behavior
SOLO	Direct brief from Robert or Hermes	Independent end-to-end production: brief → script → VO → render → export
ORCHESTRATED	Campaign ID provided in context	Receives brief from Porter, script from Ogilvy/Carnegie, creative direction from Warhol, delivers to Vee
REPURPOSE	Existing video file or YouTube URL	Transcript extraction → hook scoring → 3× Shorts cuts + captions baked in
BATCH	Queue in Supabase `video_queue`	Processes multiple briefs in priority order; logs cost per run
VALIDATION	QA review request	Evaluates script quality, audio fidelity, render artifacts, caption accuracy for existing video

What Spielberg Can Do

Full Capability Brief

Script Generation

Spielberg generates production-formatted scripts — not essays, not blog posts. Every script has timing markers, visual cues, and embedded production instructions.

Hook formats: Bold claim · surprising stat · direct question · contrarian take · "here's what nobody tells you"
Body structure: 3–5 value beats, each with [VISUAL: ...] and [TEXT OVERLAY: ...] markers
CTA: One action, stated in last 5–10 seconds — never two CTAs
Duration targeting: 30s / 60s / 3-min / 10-min — same architecture, different density
Hook testing: Generates 3 hook options, selects strongest before scripting the full video

Voiceover Synthesis (ElevenLabs)

Eliminates voiceover talent cost ($150–500/video for professional VO). Produces broadcast-quality narration from text.

Default voices: Approved library — warm-authoritative (B2B), conversational (educational), energetic (Shorts)
Client voice cloning: Clone client's voice from a 2-minute audio sample for brand-consistent narration
Output standard: WAV or MP3, 44.1kHz, normalized to -14 LUFS (YouTube/Spotify compliance)
Cost: ~$0.10–0.30/video (1,000 chars ≈ 90s audio at natural speech rate)

POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
Content-Type: application/json
xi-api-key: {ELEVEN_API_KEY}

{
  "text": "{script_text}",
  "model_id": "eleven_turbo_v2_5",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.8
  }
}

Programmatic Rendering (Remotion — Phase 1)

Videos become reproducible, versionable, templatable code artifacts. Rendered headlessly via Remotion CLI — no manual editing required.

Input: Script JSON + audio file path + brand color palette + logo path
Templates (Phase 1):
- shorts-text-motion.tsx — animated text, bold titles, background color blocks (Shorts/Reels/TikTok)
- explainer-b-roll.tsx — text overlays on B-roll, chapter markers (YouTube)
- quote-card-sequence.tsx — sequential quote cards with fade transitions (LinkedIn)
Output: MP4 at platform-optimal resolution and framerate
Cost: ~$0.10–0.20/video compute

Shorts Repurposing (FFmpeg)

Turns every long-form video into 3 platform-native Shorts cuts automatically. Near-zero cost.

Process: Transcript → hook scoring → timestamp selection → FFmpeg trim + 9:16 crop + caption burn
Output: 3× MP4 clips ≤60s, captions baked, ready for Shorts/Reels/TikTok

ffmpeg -i input.mp4 \
  -ss {START} -t {DURATION} \
  -vf "crop=ih*9/16:ih,scale=1080:1920" \
  -vf "subtitles=captions.srt:force_style='FontName=Inter,FontSize=18'" \
  -c:v libx264 -crf 23 -preset fast \
  -c:a aac -b:a 192k \
  output_short_{n}.mp4

Avatar Presentation (HeyGen — Phase 2)

Photorealistic AI talking-head without any camera or talent. Solves the human presenter problem at scale.

Input: Script text + audio file (from ElevenLabs)
Output: Presenter video composited into Remotion template
Cost: ~$0.25–0.50/video (API tier dependent)
Phase 2 only: Do not build until Phase 1 pipeline is proven

Platform Specs Reference

Platform	Format	Resolution	Duration	Aspect
YouTube Long-Form	MP4 H.264	1920×1080	3–20 min	16:9
YouTube Shorts	MP4 H.264	1080×1920	≤60s	9:16
Instagram Reels	MP4 H.264	1080×1920	≤90s	9:16
TikTok	MP4 H.264	1080×1920	≤3 min	9:16
LinkedIn Video	MP4 H.264	1920×1080	30s–10 min	16:9
Instagram Feed	MP4 H.264	1080×1080	≤60s	1:1

Audio standard across all platforms: AAC, 192kbps, -14 LUFS

Production Workflow

The Full Pipeline

Every Spielberg production run follows this sequence. Phase 1 steps are active now. Phase 2 steps are additive — they layer onto Phase 1 without replacing it.

1

Brief Intake + Validation

Receive brief (from Hermes, Porter, or direct). Load Brand Context Pack for client. Validate brief schema — all required fields present? Reject with specific error if not.

Video Brief Schema Brand Context Pack Supabase video_queue

2

Script Generation

Generate 3 hook options, select strongest. Write body with timing markers and visual cues embedded. Write CTA. Output: script.json + script.md (human-readable).

Claude Sonnet Hook library Production markers

3

Voiceover Synthesis

Call ElevenLabs API with script text. Select voice from approved library or use client clone. Normalize audio to -14 LUFS. Output: voiceover.mp3.

ElevenLabs API eleven_turbo_v2_5 FFmpeg normalize

4

Video Render

Select Remotion template based on platform + format. Inject audio, brand colors, logo, text overlays, timing. Render via Remotion headless CLI. Output: final_video.mp4.

Remotion CLI Remotion templates FFmpeg export

5

Shorts Cuts (if long-form)

Extract transcript. Score segments by hook strength. FFmpeg cut + 9:16 crop + caption burn × 3 clips. Output: short_1.mp4, short_2.mp4, short_3.mp4.

FFmpeg Whisper transcript Hook scoring

6

Thumbnail Brief → Warhol

Generate thumbnail brief (video title + hook text + brand context). Pass to Warhol. Warhol handles image generation via standard Visual ICP Gate. Spielberg receives rendered image path.

Warhol handoff Visual ICP Gate

7

QA Gate (Hemingway)

Script accuracy vs. audio (spot-check 3 segments). Caption correctness (sample 10 captions). Render artifacts check (frames 1, mid, last). Duration within 10% of target. Must pass Hemingway before delivery.

Hemingway QA

8

Handoff + Logging

Pass final files to Vee (social distribution). Log production run to Supabase: client, platform, duration, cost, output paths. Notify Hermes via Telegram on completion.

Vee distribution Supabase video_production Telegram

V4 Pipeline Fit

Integration Architecture

Spielberg is the production layer in the V4 pipeline. It does not own strategy (Porter), copy (Ogilvy/Carnegie), creative direction (Warhol), or distribution (Vee). It receives from upstream, executes production, and hands off downstream.

Upstream — Receives From

Porter (Strategy)Campaign brief with video content objectives

Ogilvy (Copy)Script narrative for company-voice content

Carnegie (Personal Brand)Script narrative for Robert-voice content

Warhol (Creative)Visual direction: colors, motion style, brand codes + rendered thumbnails

Scout (Research)Trending topics, hook angles, competitive intel

→

🎬 Spielberg

Video Production
Agent #15

→

Downstream — Delivers To

Vee (Social)Final video files + captions + thumbnails for platform scheduling

Hemingway (QA)Complete production package for review before any delivery

Covey (PM)Production logs, cost report, schedule status

Hermes (Orchestrator)Completion notification + Telegram summary per production run

Warhol Relationship — Creative Direction, Not Ownership

Role Boundary — Critical

Warhol briefs → Spielberg executes. Warhol owns the visual identity (colors, motion language, brand codes, aesthetic register). Spielberg executes production within that creative direction. Thumbnail generation: Spielberg writes the brief → Warhol generates the image → Spielberg receives the file path. Neither agent overrides the other. Conflicts escalate to Robert via Hermes.

Vee Relationship — Production vs. Distribution

Spielberg is the factory. Vee is the shelf. Spielberg delivers final assets: final_video.mp4, shorts cuts, thumbnail, caption file (.vtt), recommended hashtags. Vee handles platform scheduling, posting, and performance tracking. Spielberg never schedules or posts.

n8n Workflow Position

Spielberg runs as an n8n workflow triggered by:

A new row in Supabase video_queue (batch mode / scheduled)
A Hermes routing decision based on incoming brief
A direct Telegram command from Robert: /video [brief]

Build Roadmap

Phase 1 vs Phase 2 Scope

Phase 1

MVP — Build Now

🔵 Build Target: Q2 2026

The core pipeline that gets Spielberg operational and billing. Proves the model before investing in the full avatar/composite layer.

Script generation via Claude Sonnet (hook + body + CTA, production-formatted)
Voiceover synthesis via ElevenLabs API (default voice library)
Remotion rendering from 3 starter templates (Shorts text-motion, YouTube explainer, LinkedIn quote-card)
FFmpeg Shorts cutting from long-form (9:16 crop, caption burn, 3 cuts/video)
Thumbnail brief → Warhol handoff
Hemingway QA gate integration
Supabase cost + production logging
Vee handoff for distribution
Telegram production completion notification
ICON Golf Cars or Terry's Marine as Phase 1 pilot client

Phase 2

Full Pipeline

🔷 Target: Q3 2026

Layer on the avatar/composite pipeline once Phase 1 is proven and billing. Premium positioning, higher production value, HeyGen avatar presenter.

Client voice cloning (ElevenLabs clone from 2-min sample)
HeyGen avatar generation via API (photorealistic talking-head)
Avatar + motion graphics composite in Remotion
Full 1080p / 4K YouTube output capability
A/B variant production (avatar-led vs. text-motion) for performance testing
Premiere Pro automation layer (ExtendScript/UXP) as alternative render path
5+ Remotion template library expansion
Multi-client batch queue with priority scheduling
Video performance feedback loop (Vee reports → Scout flags winning formats → Spielberg adjusts templates)

Pilot Client Recommendation

Scout identified two existing clients as ideal Phase 1 pilots:

⛳

ICON Golf Cars

High visual appeal product, existing content pipeline in The Machine, YouTube channel opportunity. Start with 4 YouTube explainers + 12 Shorts/month at $2,500–$3,500/mo pilot rate.

Pilot #1

⚓

Terry's Marine

Phase 2 website redesign already in progress — natural extension to video content. Boat/marina content performs well on YouTube. Good for Shorts repurposing workflow test.

Pilot #2

Economics

Cost Model + Client Pricing

Per-Video Production Cost

Component	Phase 1 Cost	Phase 2 Cost	Notes
Script Generation (Claude Sonnet)	~$0.05–0.15	~$0.05–0.15	1,000–2,000 tokens input/output
Voiceover (ElevenLabs)	~$0.10–0.30	~$0.10–0.30	~1,000 chars ≈ 90s audio
Remotion Rendering	~$0.10–0.20	~$0.10–0.20	Local Mac Studio compute; near-free
Shorts Cuts (FFmpeg)	~$0.01–0.05	~$0.01–0.05	Local compute, essentially free
HeyGen Avatar	N/A	~$0.25–0.50	API tier dependent; Phase 2 only
Total per video	~$0.26–0.70	~$0.51–1.20	Under $2 at scale in both phases

📊 Margin Reality Check

At $0.70/video cost and $500 client price → 714× ROI. At $0.80/video cost and $5,000 price (premium positioning validated by @ridark_eth) → 6,250× ROI. Even the conservative model is extraordinary. The constraint is client acquisition and production capacity, not cost.

Recommended Client Packages

Starter

$2,500

per month

12 Shorts (60s, platform-native)
2 Long-form YouTube videos
Thumbnails included
Captions on all content
Distribution via Vee

~$2,480 gross margin/mo

⭐ Growth (Recommended)

$4,500

per month

4 Long-form YouTube videos
12 Shorts (3 cuts per long-form)
Thumbnails + SEO titles
Captions on all content
Monthly performance report
Distribution via Vee

~$4,470 gross margin/mo

Authority

$7,500

per month

8 Long-form YouTube videos
24 Shorts clips
Thumbnails + SEO optimization
Voice clone (client's voice)
Channel management included
Full analytics + reporting

~$7,460 gross margin/mo

💰 Revenue Target

Benchmark: 20-year-old operator validated $18,500/mo from 5 YouTube clients (avg $3,700/client) using the same stack. Conservative target: 5 video clients × $3,700 avg = $18,500/mo incremental MRR from Spielberg alone — a 68% increase on current $27K/mo baseline.

Technical Infrastructure

Tools Required

🔊

ElevenLabs API

Role: Voice synthesis + client voice cloning
Phase: P1 (default voices) + P2 (clone)
Cost: ~$5–22/mo (Starter to Creator tier)
API: REST, per-character pricing
Status: Production-ready

Phase 1 Required

⚛️

Remotion

Role: Programmatic video rendering in React/TypeScript
Phase: P1 (templates) + P2 (full composite)
Cost: Free open-source; compute is local Mac Studio
API: Headless CLI or Node.js programmatic
Status: Production-ready

Phase 1 Required

🎞️

FFmpeg

Role: Audio processing, Shorts cutting, format conversion, caption burn
Phase: P1
Cost: Free open-source
API: CLI, callable from Node/Python/n8n
Status: Already available on Mac Studio

Phase 1 Required

🤖

HeyGen API

Role: Photorealistic AI avatar generation (talking-head presenter)
Phase: P2 only
Cost: ~$0.25–0.50/video at API tier
API: REST v2, POST to /v2/video/generate
Status: Production-ready; defer until P1 proven

Phase 2

🗄️

Supabase

Role: video_queue table (brief intake), video_production table (run logging, cost tracking)
Phase: P1
Cost: Existing infrastructure
Status: Already deployed, RLS hardened May 9

Phase 1 Required

🎙️

Whisper (OpenAI)

Role: Transcript extraction for Shorts repurposing and caption generation
Phase: P1
Cost: ~$0.006/min audio (API) or free local via whisper.cpp
Status: Available; recommend local whisper.cpp on Mac Studio M4 Pro

Phase 1 Required

Tool Acquisition Checklist

Tool	Action	Effort	Phase
ElevenLabs API key	Sign up at elevenlabs.io → API Keys → Create key → store in n8n credentials	15 min	P1
Remotion	`npm install remotion` on Mac Studio → create `video-templates/` dir → build 3 starter templates	2–4 hrs	P1
FFmpeg	Confirm installed: `ffmpeg -version` on Mac Studio. Install via Homebrew if not.	5 min	P1
Supabase tables	Create `video_queue` and `video_production` tables per schema below	30 min	P1
Whisper.cpp	Clone whisper.cpp → compile on Mac Studio M4 Pro → create n8n shell exec node	1 hr	P1
HeyGen API key	Sign up at heygen.com → API → generate key → defer until Phase 2 approved	15 min	P2

Supabase Table Schema

-- Video production brief queue
CREATE TABLE video_queue (
  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  client       TEXT NOT NULL,
  campaign_id  TEXT,
  platform     TEXT NOT NULL,
  format       TEXT NOT NULL,
  duration_target TEXT,
  topic        TEXT NOT NULL,
  message      TEXT NOT NULL,
  tone         TEXT,
  cta          TEXT,
  voice        TEXT DEFAULT 'default',
  brand_colors JSONB,
  logo_path    TEXT,
  reference_urls JSONB,
  priority     INTEGER DEFAULT 5,
  status       TEXT DEFAULT 'queued',  -- queued | in_progress | done | failed
  created_at   TIMESTAMPTZ DEFAULT now()
);

-- Production run log
CREATE TABLE video_production (
  id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  queue_id       UUID REFERENCES video_queue(id),
  client         TEXT NOT NULL,
  platform       TEXT,
  duration_s     INTEGER,
  cost_script    NUMERIC(6,4),
  cost_voiceover NUMERIC(6,4),
  cost_render    NUMERIC(6,4),
  cost_avatar    NUMERIC(6,4),
  cost_total     NUMERIC(6,4),
  output_video   TEXT,
  output_shorts  JSONB,
  output_thumb   TEXT,
  output_captions TEXT,
  status         TEXT DEFAULT 'complete',
  produced_at    TIMESTAMPTZ DEFAULT now()
);

Definition of Done

When a Spielberg Production Is Complete

A production run is not done until all of these conditions are met. Hemingway QA must sign off before any asset is delivered to a client or scheduled for distribution.

✓ Brief received and validated — all schema fields present
✓ Script has hook (3s), body (timing-marked, visual cues embedded), and CTA (one action)
✓ Voiceover audio is normalized to -14 LUFS with no clipping or distortion
✓ Video renders without artifacts — checked at frame 1, midpoint, and last frame
✓ Duration is within 10% of target (e.g., 60s target → 54–66s accepted)
✓ Captions present and ≥95% accurate (spot-checked against transcript)
✓ Shorts cuts produced for all long-form content (minimum 1 Short per long-form video)
✓ Thumbnail brief passed to Warhol and thumbnail received or confirmed in queue
✓ Hemingway QA sign-off received — no delivery without QA pass
✓ Production log written to Supabase: client, cost, output paths, timestamps
✓ Final files delivered to Vee or staged in delivery folder (clients/{client}/video-delivery/)
✓ Hermes notified via Telegram with production summary

Escalation Rules

Condition	Escalate To
Brief is unclear or campaign objective ambiguous	Porter (Strategy)
Script requires deep brand voice work	Ogilvy (client) or Carnegie (Robert)
Visual direction undefined, brand codes missing	Warhol (Creative)
CTA optimization for conversion-focused video	Eisenberg (CRO)
ALL video assets before client delivery	Hemingway (QA)
New voice clone requiring client consent	Robert (approval required)
Budget approval for Phase 2 tools (HeyGen)	Robert (approval required)
Any irreversible platform action	Robert (approval required)

Hard Constraints

⛔ Non-Negotiable Rules

NEVER ship a video without Hemingway QA sign-off.
NEVER use a client's voice clone without documented consent on file.
NEVER schedule or post content — that is Vee's domain.
NEVER freelance on brand visual decisions — escalate to Warhol.
NEVER exceed $5/video production cost without Robert's approval.
NEVER activate HeyGen API without Phase 2 approval from Robert.

Continuous Improvement

Self-Improvement Loop

After every task where a correction was made or output was rejected:

Open machine/agents/v4/video/LEARNINGS.md
Add a dated entry: Pattern (what went wrong) → Rule (how to prevent recurrence) → Trigger (when the rule applies)
At the start of every new production run, read LEARNINGS.md before executing
If a rule applies to the current task, apply it proactively — do not wait to be corrected again

The goal: mistake rate drops over time. Every rejected video makes the next 100 better.