Open-Source AI Filmmaking Pipeline
The SLATE System
10 MCP servers that match or exceed a $600M proprietary system — using open-source tools, standard footage, and no special hardware.
01 — The Problem
In March 2026, Netflix acquired InterPositive for $600M. A 16-person company with no website and no public product. Their asset: a patent describing "Integration of video language models with AI for filmmaking."
Founded by Ben Affleck, InterPositive's system uses a "gray stage" workflow — actors perform on a bare stage while proprietary hardware captures everything. AI generates sets, lighting, extras, and environments in post-production.
The patent promises dramatic cost reductions across every line item in a film budget. But it requires expensive, purpose-built infrastructure that only works inside their pipeline.
Patent US12438995B1 — "Integration of video language models with AI for filmmaking" — Filed by InterPositive Media, granted 2026.
02 — Our Approach
Studio Layering And Transformation Engine. Ten independent MCP servers, each replaceable, each best-in-class.
InterPositive
Monolithic proprietary pipeline
SLATE
10 independent MCP servers
InterPositive
Physical LiDAR hardware required
SLATE
Neural depth estimation (software-only)
InterPositive
Must shoot on their stage
SLATE
Works with ANY footage from ANY camera
InterPositive
Proprietary training data
SLATE
Open-source models + standard footage
InterPositive
One integrated model for everything
SLATE
Best-in-class model for each task
03 — The 10 Modules
Each module is an independent MCP server. Swap any model, upgrade any component, without touching the rest of the pipeline.
slate-metadata-mcp
InterPositive
Proprietary on-set hardware captures camera metadata in real time. Limited to what their sensors record during the shoot.
SLATE
FFprobe + EXIF extraction + ARRI/RED sidecars for camera data. MASt3R for 3D scene geometry from video alone. YOLO v9 for scene content tagging. Whisper for audio transcription.
Why ours is better
Camera + scene + audio metadata vs just camera metadata. Works on ANY footage retroactively — including archival material shot decades ago.
slate-depth-mcp
InterPositive
Physical LiDAR scanners co-mounted with cameras. Requires custom hardware on every rig, every shoot.
SLATE
Apple DepthPro delivers metric depth maps at 2.25MP in 0.3 seconds. MASt3R generates multi-view 3D point clouds with camera poses. DELTA provides dense 3D tracking across frames.
Why ours is better
Works on ANY existing footage — even material shot years ago. Their LiDAR data cannot be applied retroactively. Our approach turns every camera into a depth sensor.
slate-scene-mcp
InterPositive
Tracks changes between takes using manually logged metadata. Relies on what crew members remember to record.
SLATE
Easi3R performs training-free dynamic scene separation. SAM 2 segments and tracks any object across frames. DELTA provides 3D point tracking for spatial consistency verification.
Why ours is better
SEES what changed between takes vs relying on what was LOGGED. Catches un-logged changes automatically — a moved prop, a shifted light, a changed costume detail.
slate-camera-mcp
InterPositive
Records exact camera motion from physical dolly rigs. Locked to their proprietary hardware and stage setup.
SLATE
DUSt3R/MASt3R extract camera paths from ANY video. Procedural physics simulation for virtual camera rigs. Library of extracted real camera motions from classic films.
Why ours is better
Can EXTRACT camera motion from any film ever made and apply it to new work. "Give me the Steadicam path from The Shining's hallway" is a valid input. They can only replay their own rig data.
slate-lighting-mcp
InterPositive
Relights scenes using depth and normals from controlled training data captured on their stage.
SLATE
RelightMaster (SOTA video relighting with Multi-Plane Light Images). IC-Light V2 scored a perfect 10/10/10/10 at ICLR 2025. Light-A-Video provides training-free temporal consistency. FFmpeg LUT pipeline for color grading.
Why ours is better
IC-Light V2 scored PERFECT marks at ICLR 2025. RelightMaster outperforms all competitors on every benchmark. Their patent was filed before these models existed — the open-source field has already surpassed their approach.
slate-narrative-mcp
InterPositive
Internal ML model checks visual continuity between shots. Limited to what the camera sees — surface-level consistency.
SLATE
Scene graph database maintains relationship maps of every character, prop, and location. Claude Opus for deep reasoning about narrative logic. Open Brain persistent memory for cross-session continuity tracking.
Why ours is better
Checks visual + narrative + logical continuity simultaneously. Catches story-level plot holes their vision-only model cannot — a character who shouldn't know something yet, a timeline contradiction, a missing motivation.
slate-cinelang-mcp
InterPositive
Custom tokenizer trained on filmmaking terminology. Translates industry jargon into internal system parameters.
SLATE
Comprehensive cinematographic ontology with relationship mapping. LoRA fine-tuning trained on the ASC Manual, Cinemetrics database (15,000+ films), and director commentary tracks.
Why ours is better
Knows filmmaking vocabulary + film history + director intent + the emotional psychology behind lens choices. "Spielberg oner" means something fundamentally different than "Scorsese oner" — our system understands that distinction.
slate-inference-mcp
InterPositive
Single proprietary model stack handles all tasks. If one capability lags, the entire pipeline is constrained.
SLATE
Intelligent routing to best-in-class model per task: DepthPro for depth, MASt3R for 3D reconstruction, RelightMaster for lighting, Claude for reasoning, Veo 3.1 or Kling 3.0 for generation, local ONNX for latency-critical inference.
Why ours is better
Best available model for each subtask, independently upgradeable. When a better depth model ships next month, swap it in without rebuilding anything else. Their monolith requires retraining the entire stack.
slate-prompt-mcp
InterPositive
Custom model translates natural language prompts into internal pipeline parameters. Limited to their system's vocabulary.
SLATE
Cinematographic ontology + Claude Opus for deep intent understanding. Structured output compatible with ANY generation API. ControlNet conditioning for precise spatial and compositional control.
Why ours is better
Understands cinematic references and director-specific styles as creative intent, not just keywords. "Spielberg oner" produces a different camera plan than "Scorsese oner" — different blocking, different energy, different emotional arc.
slate-render-mcp
InterPositive
Single model generates final frames. Output is baked — adjustments require re-rendering everything from scratch.
SLATE
Multi-source compositing: AI-generated video + 3D renders + overlays + color grade, all as separate layers. Film stock emulation (grain, halation, gate weave, LUTs). ProRes/XAVC delivery at 4K, 6K, or 8K.
Why ours is better
Separate layers mean non-destructive editing. Adjust relighting without touching the base video. Swap the sky without re-rendering actors. This is a professional VFX workflow, not a monolithic render pass.
04 — Pipeline
From creative intent to final deliverable. Each node is an independent MCP server that can be monitored, replaced, or scaled individually.
Creative Input
Generation
Analysis & Enhancement
Finishing
05 — Build Phases
Prioritized by what can run today with zero GPU, scaling up to full pipeline integration.
Phase 1
Now — No GPU required
Phase 2
PyTorch CPU inference
Phase 3
API integration
Phase 4
Integration & compositing
06 — Open Source Stack
Every model in the SLATE pipeline. All open-source, all independently verifiable, all replaceable.
| Tool | Module | License | Repository |
|---|---|---|---|
| DepthPro | slate-depth-mcp | Apple License | apple/ml-depth-pro |
| MASt3R | slate-depth-mcp, slate-metadata-mcp, slate-camera-mcp | CC BY-NC-SA 4.0 | naver/mast3r |
| DUSt3R | slate-camera-mcp | CC BY-NC-SA 4.0 | naver/dust3r |
| Easi3R | slate-scene-mcp | Apache 2.0 | Inception3D/Easi3R |
| SAM 2 | slate-scene-mcp | Apache 2.0 | facebookresearch/sam2 |
| CoTracker 3 | slate-scene-mcp, slate-camera-mcp | CC BY-NC 4.0 | facebookresearch/co-tracker |
| DELTA | slate-depth-mcp, slate-scene-mcp | MIT | snap-research/DELTA |
| SEA-RAFT | slate-camera-mcp | BSD 3-Clause | princeton-vl/SEA-RAFT |
| RAFT | slate-camera-mcp | BSD 3-Clause | princeton-vl/RAFT |
| IC-Light | slate-lighting-mcp | Apache 2.0 | lllyasviel/IC-Light |
| Light-A-Video | slate-lighting-mcp | Apache 2.0 | bcmi/Light-A-Video |
| YOLO v9 | slate-metadata-mcp | AGPL 3.0 | ultralytics/ultralytics |
| Whisper | slate-metadata-mcp | MIT | openai/whisper |
| ComfyUI | slate-render-mcp | GPL 3.0 | comfyanonymous/ComfyUI |
| RelightMaster | slate-lighting-mcp | Research Paper | arxiv.org/abs/2511.06271 |