Why Spielberg — Why Now
Three independent signals converged this week, all pointing to the same conclusion: The Machine needs a video agent, and the stack is proven.
@ridark_eth: "Claude + ElevenLabs + Premiere Pro automated video factory. $0.80 cost per video, $5K sale price. Zero editing skills required."
@gippp69: A 20-year-old charges 5 YouTube clients $18,500/month using only $74/month in tools. Stack: Claude writes scripts, Python automates voiceovers and asset creation, Premiere Pro auto-cuts to Shorts. Output: 6 content pieces per idea per week.
Both signals scored in the top tier of the weekly Twitter intelligence batch. The Spielberg agent concept has moved from hypothesis to validated real-world business model. The only question is how fast we build it.
The Machine currently has zero video capability. Meanwhile video is the #1 content format by engagement across YouTube, LinkedIn, Instagram Reels, and TikTok. Every week without Spielberg is a week The Machine's clients are missing the highest-engagement format in digital marketing.
Who Is Spielberg
Orchestration Modes
| Mode | Trigger | Behavior |
|---|---|---|
| SOLO | Direct brief from Robert or Hermes | Independent end-to-end production: brief → script → VO → render → export |
| ORCHESTRATED | Campaign ID provided in context | Receives brief from Porter, script from Ogilvy/Carnegie, creative direction from Warhol, delivers to Vee |
| REPURPOSE | Existing video file or YouTube URL | Transcript extraction → hook scoring → 3× Shorts cuts + captions baked in |
| BATCH | Queue in Supabase video_queue |
Processes multiple briefs in priority order; logs cost per run |
| VALIDATION | QA review request | Evaluates script quality, audio fidelity, render artifacts, caption accuracy for existing video |
Full Capability Brief
Script Generation
Spielberg generates production-formatted scripts — not essays, not blog posts. Every script has timing markers, visual cues, and embedded production instructions.
- Hook formats: Bold claim · surprising stat · direct question · contrarian take · "here's what nobody tells you"
- Body structure: 3–5 value beats, each with
[VISUAL: ...]and[TEXT OVERLAY: ...]markers - CTA: One action, stated in last 5–10 seconds — never two CTAs
- Duration targeting: 30s / 60s / 3-min / 10-min — same architecture, different density
- Hook testing: Generates 3 hook options, selects strongest before scripting the full video
Voiceover Synthesis (ElevenLabs)
Eliminates voiceover talent cost ($150–500/video for professional VO). Produces broadcast-quality narration from text.
- Default voices: Approved library — warm-authoritative (B2B), conversational (educational), energetic (Shorts)
- Client voice cloning: Clone client's voice from a 2-minute audio sample for brand-consistent narration
- Output standard: WAV or MP3, 44.1kHz, normalized to -14 LUFS (YouTube/Spotify compliance)
- Cost: ~$0.10–0.30/video (1,000 chars ≈ 90s audio at natural speech rate)
POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
Content-Type: application/json
xi-api-key: {ELEVEN_API_KEY}
{
"text": "{script_text}",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.8
}
}
Programmatic Rendering (Remotion — Phase 1)
Videos become reproducible, versionable, templatable code artifacts. Rendered headlessly via Remotion CLI — no manual editing required.
- Input: Script JSON + audio file path + brand color palette + logo path
- Templates (Phase 1):
shorts-text-motion.tsx— animated text, bold titles, background color blocks (Shorts/Reels/TikTok)explainer-b-roll.tsx— text overlays on B-roll, chapter markers (YouTube)quote-card-sequence.tsx— sequential quote cards with fade transitions (LinkedIn)
- Output: MP4 at platform-optimal resolution and framerate
- Cost: ~$0.10–0.20/video compute
Shorts Repurposing (FFmpeg)
Turns every long-form video into 3 platform-native Shorts cuts automatically. Near-zero cost.
- Process: Transcript → hook scoring → timestamp selection → FFmpeg trim + 9:16 crop + caption burn
- Output: 3× MP4 clips ≤60s, captions baked, ready for Shorts/Reels/TikTok
ffmpeg -i input.mp4 \
-ss {START} -t {DURATION} \
-vf "crop=ih*9/16:ih,scale=1080:1920" \
-vf "subtitles=captions.srt:force_style='FontName=Inter,FontSize=18'" \
-c:v libx264 -crf 23 -preset fast \
-c:a aac -b:a 192k \
output_short_{n}.mp4
Avatar Presentation (HeyGen — Phase 2)
Photorealistic AI talking-head without any camera or talent. Solves the human presenter problem at scale.
- Input: Script text + audio file (from ElevenLabs)
- Output: Presenter video composited into Remotion template
- Cost: ~$0.25–0.50/video (API tier dependent)
- Phase 2 only: Do not build until Phase 1 pipeline is proven
Platform Specs Reference
| Platform | Format | Resolution | Duration | Aspect |
|---|---|---|---|---|
| YouTube Long-Form | MP4 H.264 | 1920×1080 | 3–20 min | 16:9 |
| YouTube Shorts | MP4 H.264 | 1080×1920 | ≤60s | 9:16 |
| Instagram Reels | MP4 H.264 | 1080×1920 | ≤90s | 9:16 |
| TikTok | MP4 H.264 | 1080×1920 | ≤3 min | 9:16 |
| LinkedIn Video | MP4 H.264 | 1920×1080 | 30s–10 min | 16:9 |
| Instagram Feed | MP4 H.264 | 1080×1080 | ≤60s | 1:1 |
Audio standard across all platforms: AAC, 192kbps, -14 LUFS
The Full Pipeline
Every Spielberg production run follows this sequence. Phase 1 steps are active now. Phase 2 steps are additive — they layer onto Phase 1 without replacing it.
script.json + script.md (human-readable).voiceover.mp3.final_video.mp4.short_1.mp4, short_2.mp4, short_3.mp4.Integration Architecture
Spielberg is the production layer in the V4 pipeline. It does not own strategy (Porter), copy (Ogilvy/Carnegie), creative direction (Warhol), or distribution (Vee). It receives from upstream, executes production, and hands off downstream.
Upstream — Receives From
Agent #15
Downstream — Delivers To
Warhol Relationship — Creative Direction, Not Ownership
Warhol briefs → Spielberg executes. Warhol owns the visual identity (colors, motion language, brand codes, aesthetic register). Spielberg executes production within that creative direction. Thumbnail generation: Spielberg writes the brief → Warhol generates the image → Spielberg receives the file path. Neither agent overrides the other. Conflicts escalate to Robert via Hermes.
Vee Relationship — Production vs. Distribution
Spielberg is the factory. Vee is the shelf. Spielberg delivers final assets: final_video.mp4, shorts cuts, thumbnail, caption file (.vtt), recommended hashtags. Vee handles platform scheduling, posting, and performance tracking. Spielberg never schedules or posts.
n8n Workflow Position
Spielberg runs as an n8n workflow triggered by:
- A new row in Supabase
video_queue(batch mode / scheduled) - A Hermes routing decision based on incoming brief
- A direct Telegram command from Robert:
/video [brief]
Phase 1 vs Phase 2 Scope
The core pipeline that gets Spielberg operational and billing. Proves the model before investing in the full avatar/composite layer.
- Script generation via Claude Sonnet (hook + body + CTA, production-formatted)
- Voiceover synthesis via ElevenLabs API (default voice library)
- Remotion rendering from 3 starter templates (Shorts text-motion, YouTube explainer, LinkedIn quote-card)
- FFmpeg Shorts cutting from long-form (9:16 crop, caption burn, 3 cuts/video)
- Thumbnail brief → Warhol handoff
- Hemingway QA gate integration
- Supabase cost + production logging
- Vee handoff for distribution
- Telegram production completion notification
- ICON Golf Cars or Terry's Marine as Phase 1 pilot client
Layer on the avatar/composite pipeline once Phase 1 is proven and billing. Premium positioning, higher production value, HeyGen avatar presenter.
- Client voice cloning (ElevenLabs clone from 2-min sample)
- HeyGen avatar generation via API (photorealistic talking-head)
- Avatar + motion graphics composite in Remotion
- Full 1080p / 4K YouTube output capability
- A/B variant production (avatar-led vs. text-motion) for performance testing
- Premiere Pro automation layer (ExtendScript/UXP) as alternative render path
- 5+ Remotion template library expansion
- Multi-client batch queue with priority scheduling
- Video performance feedback loop (Vee reports → Scout flags winning formats → Spielberg adjusts templates)
Pilot Client Recommendation
Scout identified two existing clients as ideal Phase 1 pilots:
Cost Model + Client Pricing
Per-Video Production Cost
| Component | Phase 1 Cost | Phase 2 Cost | Notes |
|---|---|---|---|
| Script Generation (Claude Sonnet) | ~$0.05–0.15 | ~$0.05–0.15 | 1,000–2,000 tokens input/output |
| Voiceover (ElevenLabs) | ~$0.10–0.30 | ~$0.10–0.30 | ~1,000 chars ≈ 90s audio |
| Remotion Rendering | ~$0.10–0.20 | ~$0.10–0.20 | Local Mac Studio compute; near-free |
| Shorts Cuts (FFmpeg) | ~$0.01–0.05 | ~$0.01–0.05 | Local compute, essentially free |
| HeyGen Avatar | N/A | ~$0.25–0.50 | API tier dependent; Phase 2 only |
| Total per video | ~$0.26–0.70 | ~$0.51–1.20 | Under $2 at scale in both phases |
At $0.70/video cost and $500 client price → 714× ROI. At $0.80/video cost and $5,000 price (premium positioning validated by @ridark_eth) → 6,250× ROI. Even the conservative model is extraordinary. The constraint is client acquisition and production capacity, not cost.
Recommended Client Packages
- 12 Shorts (60s, platform-native)
- 2 Long-form YouTube videos
- Thumbnails included
- Captions on all content
- Distribution via Vee
- 4 Long-form YouTube videos
- 12 Shorts (3 cuts per long-form)
- Thumbnails + SEO titles
- Captions on all content
- Monthly performance report
- Distribution via Vee
- 8 Long-form YouTube videos
- 24 Shorts clips
- Thumbnails + SEO optimization
- Voice clone (client's voice)
- Channel management included
- Full analytics + reporting
Benchmark: 20-year-old operator validated $18,500/mo from 5 YouTube clients (avg $3,700/client) using the same stack. Conservative target: 5 video clients × $3,700 avg = $18,500/mo incremental MRR from Spielberg alone — a 68% increase on current $27K/mo baseline.
Tools Required
Phase: P1 (default voices) + P2 (clone)
Cost: ~$5–22/mo (Starter to Creator tier)
API: REST, per-character pricing
Status: Production-ready
Phase: P1 (templates) + P2 (full composite)
Cost: Free open-source; compute is local Mac Studio
API: Headless CLI or Node.js programmatic
Status: Production-ready
Phase: P1
Cost: Free open-source
API: CLI, callable from Node/Python/n8n
Status: Already available on Mac Studio
Phase: P2 only
Cost: ~$0.25–0.50/video at API tier
API: REST v2, POST to /v2/video/generate
Status: Production-ready; defer until P1 proven
Phase: P1
Cost: Existing infrastructure
Status: Already deployed, RLS hardened May 9
Phase: P1
Cost: ~$0.006/min audio (API) or free local via whisper.cpp
Status: Available; recommend local whisper.cpp on Mac Studio M4 Pro
Tool Acquisition Checklist
| Tool | Action | Effort | Phase |
|---|---|---|---|
| ElevenLabs API key | Sign up at elevenlabs.io → API Keys → Create key → store in n8n credentials | 15 min | P1 |
| Remotion | npm install remotion on Mac Studio → create video-templates/ dir → build 3 starter templates | 2–4 hrs | P1 |
| FFmpeg | Confirm installed: ffmpeg -version on Mac Studio. Install via Homebrew if not. | 5 min | P1 |
| Supabase tables | Create video_queue and video_production tables per schema below | 30 min | P1 |
| Whisper.cpp | Clone whisper.cpp → compile on Mac Studio M4 Pro → create n8n shell exec node | 1 hr | P1 |
| HeyGen API key | Sign up at heygen.com → API → generate key → defer until Phase 2 approved | 15 min | P2 |
Supabase Table Schema
-- Video production brief queue
CREATE TABLE video_queue (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client TEXT NOT NULL,
campaign_id TEXT,
platform TEXT NOT NULL,
format TEXT NOT NULL,
duration_target TEXT,
topic TEXT NOT NULL,
message TEXT NOT NULL,
tone TEXT,
cta TEXT,
voice TEXT DEFAULT 'default',
brand_colors JSONB,
logo_path TEXT,
reference_urls JSONB,
priority INTEGER DEFAULT 5,
status TEXT DEFAULT 'queued', -- queued | in_progress | done | failed
created_at TIMESTAMPTZ DEFAULT now()
);
-- Production run log
CREATE TABLE video_production (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
queue_id UUID REFERENCES video_queue(id),
client TEXT NOT NULL,
platform TEXT,
duration_s INTEGER,
cost_script NUMERIC(6,4),
cost_voiceover NUMERIC(6,4),
cost_render NUMERIC(6,4),
cost_avatar NUMERIC(6,4),
cost_total NUMERIC(6,4),
output_video TEXT,
output_shorts JSONB,
output_thumb TEXT,
output_captions TEXT,
status TEXT DEFAULT 'complete',
produced_at TIMESTAMPTZ DEFAULT now()
);
When a Spielberg Production Is Complete
A production run is not done until all of these conditions are met. Hemingway QA must sign off before any asset is delivered to a client or scheduled for distribution.
- ✓ Brief received and validated — all schema fields present
- ✓ Script has hook (3s), body (timing-marked, visual cues embedded), and CTA (one action)
- ✓ Voiceover audio is normalized to -14 LUFS with no clipping or distortion
- ✓ Video renders without artifacts — checked at frame 1, midpoint, and last frame
- ✓ Duration is within 10% of target (e.g., 60s target → 54–66s accepted)
- ✓ Captions present and ≥95% accurate (spot-checked against transcript)
- ✓ Shorts cuts produced for all long-form content (minimum 1 Short per long-form video)
- ✓ Thumbnail brief passed to Warhol and thumbnail received or confirmed in queue
- ✓ Hemingway QA sign-off received — no delivery without QA pass
- ✓ Production log written to Supabase: client, cost, output paths, timestamps
- ✓ Final files delivered to Vee or staged in delivery folder (
clients/{client}/video-delivery/) - ✓ Hermes notified via Telegram with production summary
Escalation Rules
| Condition | Escalate To |
|---|---|
| Brief is unclear or campaign objective ambiguous | Porter (Strategy) |
| Script requires deep brand voice work | Ogilvy (client) or Carnegie (Robert) |
| Visual direction undefined, brand codes missing | Warhol (Creative) |
| CTA optimization for conversion-focused video | Eisenberg (CRO) |
| ALL video assets before client delivery | Hemingway (QA) |
| New voice clone requiring client consent | Robert (approval required) |
| Budget approval for Phase 2 tools (HeyGen) | Robert (approval required) |
| Any irreversible platform action | Robert (approval required) |
Hard Constraints
NEVER ship a video without Hemingway QA sign-off.
NEVER use a client's voice clone without documented consent on file.
NEVER schedule or post content — that is Vee's domain.
NEVER freelance on brand visual decisions — escalate to Warhol.
NEVER exceed $5/video production cost without Robert's approval.
NEVER activate HeyGen API without Phase 2 approval from Robert.
Self-Improvement Loop
After every task where a correction was made or output was rejected:
- Open
machine/agents/v4/video/LEARNINGS.md - Add a dated entry: Pattern (what went wrong) → Rule (how to prevent recurrence) → Trigger (when the rule applies)
- At the start of every new production run, read LEARNINGS.md before executing
- If a rule applies to the current task, apply it proactively — do not wait to be corrected again
The goal: mistake rate drops over time. Every rejected video makes the next 100 better.