Three data points to set the temperature:
① In March 2026, Yann LeCun's AMI Labs closed a $1.03B seed round — the largest in European history. The same month, Fei-Fei Li's World Labs crossed $1.2B raised. Two researchers, $2.2B in cheques, both building the same thing — world models.
② Ant's "Lingguang" app shipped, hit 2M downloads in six days, and users built 3.3M "flash apps" in two weeks. Mini interactive games were the most popular category. One sentence, 30 seconds, a playable game.
③ Kunlun's Matrix-Game 3.0 hit 5B parameters, 720p, 40 fps real-time game generation. Two years ago the ceiling here was "play DOOM for three seconds."
Top researchers are walking out of LLM labs with billion-dollar cheques to do world models. Ordinary users are already vibe-coding playable mini-games in 30 seconds. The term World Model is being squeezed toward the middle from both the peak and the floor.
What is it? Why now?
Bluntly: the LLM is blind. No matter how strong it gets, it's still just predicting the next token. Ask it to describe gravity — it'll write you 3,000 words. Ask it to simulate a ball rolling off a table — there is no "table" in its head.
World models are the fix: teach AI not just to talk, but to see the road.
Why this is suddenly happening in 2026
Three forces converging:
-
LLM scaling is hitting a wall. Pre-training data is near the ceiling. Pure-compute returns are diminishing. OpenAI internally admits GPT-5 underdelivered. The industry needs a new narrative, and you can't tell LPs "we're still scaling" for the third year running.
-
Video generation exposed the missing piece — understanding. Sora, Veo — beautiful frames, terrible physical consistency. A person walking grows a third leg. That's not AI, that's Photoshop on autopilot. The field is having an honest moment: pretty pixels aren't enough; the model needs to actually understand.
-
Embodied AI and robotics need training environments. Robots can't afford trial-and-error in the real world. World models are the natural simulation engine.
The major players
Yann LeCun / AMI Labs — the most expensive bet on first principles
March 2026: LeCun announces AMI Labs. $1.03B seed at a $3.5B valuation. Largest European seed ever. No product. Bezos Expeditions, NVIDIA, Samsung, Toyota Ventures all on the cap table.
The bet is JEPA (Joint Embedding Predictive Architecture) — don't predict pixels, predict the next world state in an abstract embedding space. Translation: not "what does the next frame look like" but "what is the state of the world about to be." LeCun's own line: "LLMs are too limited. Scaling won't get us to AGI."
This is a long-horizon bet. CEO Alex LeBrun says the first usable thing is roughly a year out. Investors are buying LeCun's name and ten years of academic groundwork, not near-term returns. Target verticals: healthcare, robotics, wearables, industrial automation.
Fei-Fei Li / World Labs / Marble — the first to commercialise
Completely different playbook. Marble shipped November 2025; another $1B raised in February 2026 (NVIDIA, AMD, Autodesk). Currently the only world-model product with pricing and consumer access.
Marble's positioning: multimodal 3D world generation. Feed it a photo, a sentence, or a video clip — out comes a 3D space you can walk into at 360°. Not a panorama, real 3D, exportable as Gaussian splats and triangle meshes, directly droppable into Unity or Unreal.
They also shipped Chisel — an editor where you block out spatial structure first and AI fills in visual detail. Structure and style separated, so the creator stays in control instead of being dragged around by the model.
Traditional 3D scene work: tens of thousands to hundreds of thousands of dollars, days to months. Marble: a few minutes. Pricing: free to $95/mo. Whether this kills game-art jobs is overstated — it more likely kills the "3D environment outsourcing" business.
Google DeepMind / Genie 3 — the deepest research bench
DeepMind's accumulated lead in world models is real. Genie 1 → Genie 2 → Genie 3 (August 2025), trained on 30,000+ hours of game footage. Generates interactive 3D environments from text in real time at 24 fps. Research preview opened January 2026.
They frame it as a key stepping-stone to AGI — once you have a world model, you can infinitely generate training environments to teach agents. Still preview-only, no commercial product.
Decart / Oasis — the first one you can actually play

Oasis generating Minecraft-style gameplay in real time — every frame AI-generated, no traditional engine.
Oasis claims to be the first fully AI-generated playable game in real time. You control with keyboard and mouse; a Diffusion Transformer generates each frame at 20 fps with zero latency. No pre-rendering, no traditional engine — every pixel, every physics interaction, every rule is improvised by the model. Looks like Minecraft. 500M-parameter version open-sourced.
Limits: blurry, imprecise inventory handling, long-session consistency collapses. But a giant leap from Google's GameNGen (3 seconds of DOOM) in 2024. Lucy 2.0 and Custom Worlds let you upload an image and turn it into a playable world.
Honestly: today's Oasis is more "proof that AI can make games" than a game you'd play. But proof matters in this industry — sometimes more than the game.
Runway — from video tool to world simulator
Started as an AI video generator (Gen-1 → Gen-3, popular in film and ad work). Then in late 2025 shipped GWM-1, a "general world model." Tagline changed to Building AI to Simulate the World. March 2026: a $10M fund for AI + world-simulation companies.
The play is clear — go from tool to platform. I think Runway is the most undervalued name on this list. Real revenue (video SaaS), mature user base (film/ad industry), accumulated tech (Gen series ports straight into world modelling). Not pure burn-down research like AMI Labs — earning while pivoting. That path has a far higher success rate.
Tencent HunyuanWorld + Kunlun Matrix-Game
Tencent anchors HunyuanWorld in its own games, maps, and AR/VR business. WorldPlay interactive model, international version on Tencent Cloud (Nov 2025), open-source version past 3M downloads. The advantage is real demand pulling the model.
Kunlun shipped three models at Zhongguancun Forum in March 2026: Matrix-Game 3.0 (5B params, 720p, 40 fps real-time, integrated with GTA V and Cyberpunk 2077 for data collection), SkyReels V4, Mureka V9. Plus Cat Forest Academy, an "AI Roblox."
Two years of interactive AI games

World-model milestones: from 3 seconds of DOOM to a billion-dollar seed round.

GameNGen comparison: early world models (left, unreadable blur), GameGAN (middle), GameNGen (right, near-identical to real DOOM).
-
Aug 2024 — GameNGen. Google Research, modified Stable Diffusion, playable DOOM at 20 fps on a single TPU. Two-stage training: RL agent plays DOOM to collect trajectories, then the model learns "previous frames + action → next frame." Humans can't tell AI from real footage — but only for 3 seconds. The proof of concept.
-
Oct 2024 — Oasis. Decart, swap to DiT + Diffusion Forcing, open-world Minecraft, 20 fps. From 3 seconds of DOOM to persistent open world. Big jump.
-
Aug 2025 — Genie 3. DeepMind, text-driven interactive environment generation. Not constrained to imitating an existing game — invents new worlds. 30,000+ hours of training data.
-
Mar 2026 — Matrix-Game 3.0. Kunlun, 5B params, 720p, 40 fps, Memory mechanism for long-horizon consistency, dual pipeline (Unreal synthetic + AAA gameplay data). Starts looking industrial.
Two years: DOOM-3s → 720p 40fps open-world. Fast progress, but a long way from replacing traditional engines.
The common problems: blurry visuals, imprecise physics, long-session consistency collapse.
My read: the realistic shipping vector for AI interactive games is incremental, not replacement — fast scene generation, NPC behaviour, procedural levels. Whole-game real-time AI generation? Another two to three years. Anyone shouting "the game engine is dead" probably hasn't shipped a game. Try getting two players in the same AI-generated world to see identical physics — determinism alone kills 99% of these pitches.
Four technical paths

NeoVerse 4D world model: monocular video → 4D Gaussian reconstruction → arbitrary-view generation (CVPR 2026).

DreamerV3: the agent 'dreams' practice runs inside a learned world model, then transfers to the real environment (Nature 2025).

Radar of the four paths' capabilities: no single path dominates every axis — which is why I'd bet on the hybrid play.
| Path | Method | Examples | Strengths | Weaknesses | |---|---|---|---|---| | DiT-generative | Diffusion, frame by frame | Oasis, Matrix-Game 3.0 | Visually direct, good UX | Doesn't understand physics, long-horizon breakdown | | JEPA | Predict in embedding space | V-JEPA 2, AMI Labs | Efficient, semantic, plannable | No visual output | | 3D/4D reconstruction | 2D → persistent 3D structure | Marble, NeoVerse | Geometric consistency, editable | Weak on dynamic scenes | | Hybrid RL + simulation | RL + world model + physics | Genie 3, Cosmos, Dreamer | Closest to true reasoning | Compute-heavy |
No path dominates on every axis — which is exactly why I'd bet on the hybrid play winning long-term.
A product map

The world-model & interactive-AI quadrant: the top-right 'gold zone' is the least crowded, the top-left 'lightweight breakout' the most.
Easiest mistake on this beat: every conversation about world models defaults to LeCun and Fei-Fei Li and the multi-billion-dollar money. There are 40+ products running in parallel. By raise size:
Tier 1 — Giants and unicorns ($100M+). AMI Labs ($1.03B / JEPA), World Labs / Marble ($1.23B / 3D world gen), Google DeepMind / Genie 3 (not independent / interactive world gen), NVIDIA Cosmos (platform / physical-AI foundation), Runway GWM-1 ($860M cumulative / general world model, three lines), General Intuition ($134M seed / spatial reasoning agents). Common trait: foundation models or platforms, burning hot, commercialisation mostly unproven.
Tier 2 — Mid-tier with real revenue or $5M–$50M raised. Decart / Oasis ($53M), Odyssey Explorer (former self-driving team, now interactive video), Kunlun Matrix-Game (public-company-funded), Tencent HunyuanWorld series, Inworld AI (NPC engine, integrated with Unity/Unreal, validated by Skyrim mod community), Charisma.ai (dialogue / narrative AI), RPGGO (pre-seed, text-to-open-world RPG, Tencent diaspora team), Scenario (game art assets, custom style training), Rosebud (full-pipeline web game creation), SEELE / Baidu-backed (end-to-end 3D game gen, Unity export), WebSim ($11M, AI web/game generator), Jenova.ai (agent-driven roleplay and narrative), SpAItial ($13M seed, image → 3D Gaussian Splat). Common trait: clear vertical or validated user data.
Tier 3 — Small/open-source/early. MakeGamesWithAI, Spawn.co, Ludo.ai, Saga, AI Town, Layer AI, Meshy, Cascadeur, Replica Studios, Leonardo AI, Convai, Promethean AI, AIVA + Beatoven.ai, Etched / Sohu (custom Transformer ASIC, Oasis's hardware partner), Yume 1.5, Microsoft Muse (Xbox research), RADiCAL, Figma Make, Google Playables Builder (YouTube-embedded, Gemini 3-driven). Trait: narrow wedges, but if the category breaks open, each is a potential ecosystem piece.
My read: most people only watch Tier 1 because the headlines are loudest. The money is closer to Tier 2 — Inworld is validated by Skyrim modders, Scenario gets cash from indie devs, Tencent HunyuanWorld past 3M open-source downloads. Tier 3 looks small, but remember Roblox started as an unremarkable side-tool.
Seven product shapes, with honest business reads
1. Vibe-coded mini-games — 2026's first breakout category
January 2026, the most-talked-about game wasn't a AAA — it was a text-only game smaller than a "mini-game." Big Tech Simulator crushed its servers on launch day. Cyber Hike: Aotaixian Line went viral. One- or two-person teams, no art, just text choices and stat progression. Why? Vibe coding didn't lower the barrier; it erased it. No programming, no art, just an idea plus some token credits. AI generates systems, numbers, branching story. You provide the concept. A weekend ships a game.
For traditional studios this is humbling: your three-year indie game can be out-trended by a college student's weekend build — not because their game is better, but because they were absurdly fast and their concept is meme-shaped.
Business shape: social virality + ads. A link is the entire product. Loads faster than any mini-game. Zero psychological commitment. Naturally viral. Ceiling per game isn't huge (tens to hundreds of thousands of dollars), but production cost is near zero — ROI is extreme.
2. AI flash apps / flash games — Lingguang, Google Playables Builder
Lingguang (Ant Group) is one of late-2025's biggest AI launches in China — 2M downloads in six days, faster than ChatGPT or Sora 2. Core feature: "flash apps" — describe what you want and get a working, editable, shareable mini-app in 30 seconds. Two weeks in: 3.3M user-created flash apps spanning mini-games, mood tools, countdowns, study aids. Later upgrade: "flash games" — "make me a 1942 shooter" and 30 seconds later you have one with editable characters, backgrounds, difficulty.
Google Playables Builder. YouTube's official AI game-generator, Gemini 3-driven. Lets creators turn text, images, or video clips into HTML5 mini-games embedded directly in the watch page. Google's intent is naked: contest Roblox, capture young users' time.
Business shape: platform stickiness + ecosystem lock-in. Lingguang keeps users inside Ant's ecosystem — Alipay mini-programs, credit, payments. Expect "generative mini-program" booms across ByteDance, Alibaba, Tencent in 2026 — generation grafted onto existing payment/social/commerce surfaces. Google Playables turns content from one-way playback to two-way interaction.
The competition here isn't product. It's ecosystem. Whoever owns distribution wins. Lingguang has Alipay, Playables has YouTube — an independent dev competes with what, exactly?
3. AI NPCs / interactive narrative — Charisma.ai, RPGGO
Charisma.ai doesn't make games. It sells an AI dialogue and character system — controllable AI characters, dialogue logic, branching interactions. Used in narrative games, training simulations, brand experiences, education.
RPGGO. Text-to-open-world: feed it a synopsis, get a playable RPG with branching plot, NPCs that remember you, real-time portrait and voice generation. Team from Tencent et al. Pre-seed from Makers Fund.
Jenova.ai. Uses dedicated AI agents for distinct interactive content types — Roleplay Game Master (tabletop-style RPG with infinite memory and arbitrary rule systems), Film Screenwriter, Webtoon Creator. Doesn't train models — orchestrates GPT-5.2, Claude, Gemini 3 in an agent framework. Probably the smartest play for small teams: stay above the model layer, own the scene layer.
Saga. AI text adventure and roleplay platform — retro aesthetic, AI-driven dialogue and plot.
AI Town (Convex). AI characters live autonomously in a virtual town — independent personalities, persistent memory, evolving goals. The product version of Stanford's "25 agents living in a village" paper.
Business shape: B2B middleware + C subscription. Charisma sells to studios, schools, brands as an API. RPGGO is consumer subscription. The bigger imagination: when an NPC truly remembers you and reacts dynamically, replay value and willingness-to-pay both jump.
4. World-model native products — Oasis, Cat Forest Academy, WebSim

Oasis's open world — every block, sky, and light is computed by AI in real time; not a single pixel is pre-made.
Oasis — covered above. Free demo, commercial path TBD.
Cat Forest Academy 2.0 (Kunlun). Positioned as "AI Roblox" — play and create games with voice commands.
WebSim. Natural-language generator for web pages and small interactive apps. Iterative editing, shareable links. Raised ~$11M. Not a full game engine but excellent for web-game prototyping.
Business shape: UGC platform economics. Don't make games — let users make games, take a cut. Roblox proved the ceiling ($3B+ revenue). AI further drops the barrier from "people who can code" to "people who can talk."
5. 3D world generation tools — Marble, Rosebud, SEELE
Marble. Already covered. $0 to $95/mo. Game devs, VFX studios, architects. Early users dropping Gaussian splats into Unity. Visible on Vision Pro and Quest 3.
Rosebud. Cloud-based full-pipeline game creation — prompt to playable 2D/3D prototype, with built-in sprite animator, AI NPC creator, visual novel tools.
SEELE (Baidu-adjacent). End-to-end multimodal game generator. Text to interactive 3D world, with Unity export (an advantage over Rosebud). 5M+ animation presets, full audio gen, claims 480× faster than hand-coding.
Spawn.co. Natural-language creation of 3D multiplayer worlds, apps, and virtual experiences.
SpAItial. European, $13M seed. Echo model: single image → 3D Gaussian splat. Lighter than Marble — single scene, not full world. Good for e-commerce product 3D, interior design preview — look at cases, not walk into cases.
Tencent HunyuanWorld series. The fastest-iterating open-source world-model family right now. v1.0 (Jul 2025): text/image → 360° 3D world, Unity/Unreal export. v1.1 WorldMirror (Oct 2025): video → 3D. FlashWorld (also Oct): single-GPU 5–10s 3DGS generation. Voyager (Sep): long-range 3D exploration. v1.5 WorldPlay (Dec): real-time interactive. Five releases in six months, 3M+ open-source downloads. If you're an indie dev sampling this space, HunyuanWorld is the best value entry — free, documented, runs on a 4090.
Business shape: SaaS + cost displacement. Game art is 50–80% of dev cost; a single 3D character can run tens of thousands to nearly a million. Marble's value equation: $100K scenes become $20/mo, done in minutes.
6. AI game-asset toolchain — invisible infrastructure
Not whole games — individual pipeline stages accelerated by AI. Stitched together, they form a fully AI-native production line.
- Scenario — custom-style AI asset generation, 12 generation modes, up to 16 per batch. Essential for under-staffed indie studios.
- Inworld AI — AI NPC engine, ex-Google Dialogflow team, deep Unity/Unreal integration. Modders for Skyrim and Mount & Blade 2 are already paying — players will pay for smarter NPCs.
- Convai — real-time voice NPCs, 200–300 ms latency, suited for VR/AR.
- Replica Studios — AI voice acting with commercial licensing.
- Cascadeur — AI animation. Set keyframes, AI fills the in-between motion. Replaces mocap at orders-of-magnitude lower cost.
- Leonardo AI — bulk art asset generation.
- Meshy — text/image to 3D model.
- Promethean AI — natural language → 3D environments, deep Unreal integration, already in AAA studios.
- AIVA / Beatoven.ai — AI game music.
- Ludo.ai — AI game R&D assistant. Doesn't make assets, doesn't write code — analyses chart-topping games, mixes mechanics, generates playable prototypes.
Business shape: toolchain SaaS, each slicing one stage. The ideal indie workflow: Ludo for concept → Scenario for art → Meshy for 3D → Inworld for NPCs → Replica for voice → AIVA for score. Each charges monthly. By 2026, this Lego-stack is the indie default.
7. World-model interactive video / exploration — the new category

TeleWorld's Macro-from-Micro planning: DiT generates video segment by segment, a macro planner controls long-horizon consistency.
Most cutting-edge, furthest from monetisation, biggest imagination.
- Runway GWM-1 / Game Worlds. Three lines: GWM-Worlds (interactive worlds), GWM-Robotics (robot sim with Python SDK), GWM-Avatars (conversational digital humans). Game Worlds is the consumer entry — browser-based interactive text adventures, AI-generated. 720p / 24 fps, physics-aware.
- Odyssey Explorer. "Interactive video" — video you can watch and operate at the same time. 40–50 ms per frame, 20 fps streaming. Causal generation — every action changes every possible future. Training data from a self-driving team's 360° captures. Realistic Gaussian-splat output, exportable to Unreal/Blender/After Effects.
- Microsoft Muse. Xbox's world model, trained on seven years of Bleeding Edge gameplay. Real-time game scenes from controller input. Research only.
- Yume 1.5. Open-source text-controlled interactive world generation.
- NVIDIA Cosmos. Not consumer — developer platform for physical-AI foundation models. Self-driving and robotics customers. 2M+ downloads.
Business shape: mostly B2B and research today. Runway GWM-Robotics sells sim to robot companies (cheaper than real-world testing by orders of magnitude). Game Worlds is C-side beta. Odyssey targets film post and game env preview. Cosmos goes developer-platform. The honest summary: still hunting first paying customer. Once it lands, the imagination is enormous — world model as a service, priced by world count.
Don't get fooled by the demo

Current world-model capability scores: physical consistency and long-horizon stability both fail — the biggest gaps.
Honest scorecard: watchable, not usable. Friendable, not productionable.
| Dimension | Grade | Notes | |---|---|---| | Visual fidelity | B+ | Strong over seconds, blurs over minutes. Marble's static 3D is good; up close you see Gaussian-splat speckle. | | Physical consistency | C | The biggest gap and the field's emperor's-new-clothes problem. Balls clip walls, water flows up, the cup on the desk turns into a vase when you turn back. CVPR 2025 benchmark — best VLMs match random chance on distinguishing motion trajectories. Random chance. These models that "understand the world." | | Interactive control | B− | Keyboard/mouse is basically real-time, precision is shaky — try placing a block exactly. Matrix-Game 3.0 separates mouse and keyboard signals, helps. | | Long-horizon stability | C− | Classic autoregressive issue — error compounds. Error Buffer, Diffusion Forcing, 4D-reconstruction guidance all attempt this. None ship "infinite duration." | | Inference efficiency | B | 20–40 fps real-time exists, but at 256×256 to 720p. 1080p/4K real-time needs another order of magnitude of compute. DyDiT-class efficiency work helps; dedicated silicon may be the final answer. |
The demos you've seen are cherry-picked from 100 runs. Adjust your priors accordingly.
What 1–3 years out probably looks like
H2 2026 (6–12 months)
- World models enter the production pipeline for scene gen / NPCs / maps as complements to traditional engines, not replacements. First payers: mid-sized studios. Big publishers have internal stacks; indie devs can't afford the new tools; the middle layer is starving and pays.
- V-JEPA family hits sim-to-real PoC in robotics. Academia gets excited, industry keeps watching.
- 4D world models become standard data augmentation for self-driving simulation. This is where world models earn the first real money — self-driving companies have cash and need sim data.
- Vibe-coded mini-games keep exploding, but 99% of it is noise. Steam's AI-generated game count triples; <1% makes money.
- AMI Labs will probably still be heads-down. LeCun isn't a product person; he's a paradigm person. Don't rush him.
2027 (1–2 years)
- Dedicated inference silicon ships. 1080p real-time interactive worlds become possible. AI-native games go from demo to 10–30 minute complete experiences. Note: complete, not good. Playing for 30 minutes without consistency collapse will be a significant achievement.
- Paths converge — hybrid (3D/4D reconstruction + generative) probably wins. Pure-generative (Oasis line) is pretty but physics-fake; pure-JEPA understands but renders nothing. Bolt them together — JEPA as the brain, DiT as the eyes — and you have the endgame.
- First world-model API officially integrated by Unity or Unreal. This is the real milestone — once you're in the engine toolchain, you've graduated from research toy to production tool. I'd bet Unity moves first; they need differentiation more.
- First wave of world-model startup deaths. Mis-cast on path, unable to ship product, founders who can write papers but not products — 2027 is the filter year.
- Copyright and training-data lawsuits arrive. Game companies fight harder than publishers — Rockstar's lawyers don't play.
2028 (2–3 years)
- World model + LLM + agent becomes the standard architecture. LLM as mouth, world model as eyes-and-brain, agent as hands-and-feet. LeCun's "LLM is the interface, world model is the substrate" gets validated. Looking back at 2024 pure-LLM apps from there will feel like looking at flip-phones today.
- AR glasses become the killer hardware for world models. Meta Orion, Apple Vision's descendants — without world models these are expensive screens. With them, you get real spatial computing: room layout understanding, correct occlusion between virtual and real, persistent memory of what's behind a wall you walked through. That is what AR is supposed to be.
- "Say a sentence, walk into a world" moves from sci-fi to consumer product. Visual quality will still feel like today's VR Chat — usable but rough. Anyone telling you 2028 will deliver cinematic quality is fundraising.
- Bold prediction: by end of 2028, "world model" is a daily-vocabulary word the way "LLM" is now. Ordinary people won't know what JEPA is, but they'll casually say "that AI-generated room render is great." Tech eventually disappears behind product.
The direction is right. Don't rush it.
LLM → world model is a cognitive upgrade for AI. Text → image → video → 3D → interactive 3D — each step is a dimensional leap. LLMs taught AI to talk; world models are teaching AI to see the road. The gap between an AI that only talks and an AI that also navigates the world is not small.
But funding numbers can mislead. LeCun has a billion dollars and says a product is a year out. Marble works but isn't industrial. Genie 3 is dazzling but not productised. Matrix-Game 3.0 benchmarks beautifully but isn't fun. The recurring vice of the field: the demo is always the best version of the product.
The direction is certain — AI has to graduate from understanding text to understanding worlds. Who gets there first, by which path, on what timeline — that's fog.
I'm building Mana on a related premise: help ordinary people use words to create apps and interactive experiences. Once the world-model layer matures, "say a sentence, generate a world you can walk into" becomes plausible. Exciting to think about.
But until then, a lot of unglamorous work. Anyone shipping AI products knows the hard part isn't the dazzling demo. It's the model that still works in the hands of the ten-thousandth user. This field doesn't lack storytellers. It lacks people willing to fill in the boring edge cases one by one.
The era of scaling parameters is ending. The era of scaling world understanding is just starting.