What just happened
Over the last few weeks, three heavyweight platforms hit ship at almost the same time. Feishu released the official Lark CLI. DingTalk open-sourced dingtalk-workspace-cli. Google shipped Workspace CLI with a built-in MCP Server.
Meanwhile, on the AI side: OpenAI has Codex CLI. Anthropic has Claude Code (CLI-native, 300% usage growth in three months). Google has Gemini CLI.
Two threads are converging. AI needs an interface for operating the world. SaaS needs an interface for being operated by AI. The CLI is where they meet. This isn't coincidence — it's a structural shift.
Five reasons the CLI is the agent's natural interface
HackerNoon recently ran a piece titled MCP Is Dead. The CLI Is Winning the AI Agent Stack. The headline overshoots but the direction is right. To really see why, you can't stop at "text in, text out." You need to look at five layers — model substrate, training data, token efficiency, infrastructure, error handling.
1. Training data preference — LLMs already know how to use a CLI
Easiest to overlook, possibly the most important. LLMs are trained on billions of lines of text. A huge fraction of that text is terminal interaction — Stack Overflow answers, GitHub repos, man pages, tutorials, CI/CD config.
Ask Claude or GPT to run git log --oneline -10 and it doesn't need a schema. It saw that command used thousands of times during training. Same for pytest, docker, kubectl, gh.
MCP Servers are the opposite. Every server is a custom schema the model meets for the first time at runtime. Even a perfect description requires on-the-fly reasoning over an unfamiliar interface. It's like asking a fluent English speaker to read a newly invented conlang — they can guess, but they're never as accurate as native.
The New Stack editor put it well: "The CLI is where you do deterministic work — one expected outcome, one reasonable path." That's exactly the regime LLMs perform best in. When writing code, LLMs sometimes create. When typing commands, they're engineering — mapping natural language onto an existing set of commands.
2. Token efficiency — CLI is pay-as-you-go, MCP is all-you-can-eat
The context window is the LLM's scarcest resource. Every token you spend on tool schema is a token you can't spend on reasoning.
MCP has a structural problem: connecting to an MCP Server injects the full tool schema into the agent's system prompt. A GitHub MCP Server has 93 tools — the schema alone burns ~55,000 tokens before you ask anything. A CLI call is a bash command and its stdout. Tokens only spent when actually used.
The real numbers are damning:
- CircleCI browser automation benchmark. CLI was 33% more token-efficient than MCP. Task completion score 77 vs 60. The gap widened on multi-step debug workflows — the MCP approach ran out of context mid-task; CLI finished.
- DOM querying. MCP's snapshot returns a 52,000-token accessibility tree. Two targeted CLI queries cost 1,200 tokens. 43× efficiency gap.
- File conversion. Switching from MCP Server to CLI tools dropped token consumption ~40%. Agent had 95% of context available for reasoning and finished the entire pipeline in one shot.
One dev deleted three MCP Servers and went CLI-direct. The reason was crisp: "MCP has an ambient token cost — tool definitions must live in system prompt, eating context before you've used anything. CLI calls are one bash command and its stdout — used and gone."
Same logic explains why Vercel went from 80% to 100% accuracy after pruning agent tools from 15 to 2 — fewer tool definitions, more context for reasoning. CLI is pay-per-use context by construction.
This is also why Anthropic is pushing the search tool primitive — progressive disclosure of tools, not pre-loaded schemas.
3. Architectural match — text in, text out
An LLM is fundamentally a text-to-text function. So is the CLI. The isomorphism means zero conversion overhead.
What does a GUI operation require? Screenshot → pixel recognition → coordinate location → simulated click → wait for render → screenshot again to confirm. Each step adds latency and error. And GUIs change — buttons move, modals get redesigned, your automation breaks.
CLI? Submit a line of text, get a chunk of text back. The agent doesn't have to see anything. The filesystem state is binary — a file is on disk or it isn't, a test passed or it didn't. That binariness reduces hallucinations — the agent doesn't guess whether a change took effect, it cats or ls's to verify.
IDE-integrated AI has a hidden cost too: maintaining editor state (which files are open, where the cursor is, what's selected) — Cursor sends this every interaction. CLI agents track none of it. Read the file, do the work, that's it. The entire context window is available for actual tasks.
4. Infrastructure advantage — composable, scriptable, parallelisable
The Unix philosophy — small tools doing one thing, composed by pipes — fits how agents work. This isn't a metaphor. It's directly reusable infrastructure.
Composable. pytest --tb=short 2>&1 | head -50 truncates long output. The LLM saw that pattern thousands of times in training. MCP responses force the agent to parse and filter inside the context window — moving shell work into tokens.
Scriptable. A CLI agent can run inside CI/CD pipelines — a bash script lets the agent auto-review code on every PR, run tests, fix lint errors. Try doing that with an IDE plugin. The CLI lets the agent become just another binary in the toolchain.
Underneath scriptability there's a deeper alignment with how LLMs actually execute algorithms:
LLM output is essentially write code, then execute. Every mainstream agent framework now does: model emits a structured instruction (tool call) → harness parses and runs it → stdout/stderr go back into context → model decides next step. That's a ReAct loop (Reason → Act → Observe → Reason). The CLI fits the loop perfectly — model reasons out a bash command, shell runs it and returns text, model observes and reasons again. Pure text, zero modal conversion.
One layer deeper: Code-as-Action. Frontier agent design is shifting from "pick a tool" to "write code" as the action space. Reason is simple — traditional tool-calling makes the model select one tool from a list and fill in parameters. More tools = bigger context = higher chance of picking wrong (the Vercel 15→2 lesson). Code-as-Action: the model emits executable code (bash/Python) directly, the sandbox runs it. Action space is theoretically infinite — anything expressible in code, without pre-defining every tool. Anthropic's MCP tool optimisation walks this direction — replace 150,000 tokens of tool schema with letting the agent browse the filesystem to discover APIs. 150K → 2K, a 98.7% drop.
The CLI is the natural execution layer for Code-as-Action. git log --oneline -10 | grep "fix" is code — it expresses an intent, the shell executes it, structured output comes back. The model didn't pick one of 93 GitHub tools. It wrote one line.
Parallelisable. Each agent is its own process, its own context window, its own token budget, its own tool permissions. git worktree lets multiple agents safely work on the same project on different branches.
5. Error output — whether an agent can self-heal depends entirely on how good the errors are
Good CLIs return error messages that are structured, parseable, actionable. The agent reads it and knows the next step without human help.
GUI pops a dialog saying "operation failed" — the agent has nothing to work with. CLI returns ERROR: parameter --workspace-id is required. Run lark-cli workspace list to find available IDs. — the agent immediately knows what to do.
Feishu's Lark CLI is specifically optimised for this. DingTalk's --dry-run goes further — the agent previews an action's outcome before deciding whether to execute. These aren't conveniences for humans. They're infrastructure for agents.
Three shots in one week
Feishu Lark CLI (2026.03.28)
Feishu open-sourced Lark CLI (@larksuite/cli), MIT, written in Go. Compresses 11 product domains into 200+ commands — messages, docs, calendar, mail, sheets, multi-dimensional tables, tasks, wiki, video conferencing, drive, spreadsheets.
The README's "Why lark-cli?" section says it plainly:
- Agent-Native Design. 19 structured Skills out of the box. Compatible with Claude Code, Cursor, Windsurf — zero extra config.
- AI-Friendly & Optimised. Every command tested with real agents. Concise params, smart defaults, structured output — to maximise agent success rate.
- Three-Layer Architecture. Shortcuts (human + AI friendly) → API Commands (platform sync) → Raw API (full coverage). Pick the granularity.
- Up and Running in 3 Minutes. One-tap app creation, scan-to-login, first API call within three steps.
The founder of 53AI installed it and immediately had Claude Code send personalised messages to 25 employees — done in a minute. Then the agent read multi-dim tables and generated a visualisation page, then audited the project tracker for missing fields and sent reminders.
The Wake Word feature is wild. Set the trigger "lobster lobster." In a meeting, you say "lobster lobster, organise this plan into a doc and send it to the boss." After the meeting, the agent finds your instruction in the transcript and executes.
"Feishu's move is bold — traditional products measure DAU and session time, and both require opening the GUI. After CLI open-sources, agents call commands directly. Sometimes you don't open Feishu at all. Classic DAU might fall, but real user value rises." — 53AI
DingTalk dingtalk-workspace-cli (2026.03)
Alibaba's DingTalk open-sourced dingtalk-workspace-cli (DWS), Apache-2.0, Go. Also "designed for both human users and AI agent scenarios." Covers calendar, todos, directory, attendance, more.
- Discovery-driven pipeline. The CLI hard-codes no product commands. It dynamically discovers services from an MCP registry (mcp.dingtalk.com) and generates a Cobra command tree. Backend ships a new product? CLI doesn't need code changes — the registry adds a service description,
dwsgenerates the matching commands automatically. Elegant design. - Intent decision tree + Agent Skills. Twelve Python scripts under
skills/. Not API docs — operating instructions. Decision trees guiding the agent through user intent. Installed to~/.agents/skills/dwsautomatically; most AI agents discover them. - Safety + preview. Dangerous operations require human confirmation. Batch limit of 30.
--dry-runpreviews without executing.-f jsonstructured output throughout.
DingTalk's MCP marketplace (mcp.dingtalk.com) is the parallel move — publishing DingTalk products (like AI sheets) as MCP Servers so third-party agents in OpenClaw can operate DingTalk data directly.
Google Workspace CLI (2026.03)
Google released the open-source Workspace CLI (Apache-2.0) with gws mcp built in — boots an MCP Server over stdio, letting Claude Desktop, Gemini CLI, VS Code, or any MCP client access Drive, Gmail, Calendar, Docs, Sheets.
The commercial logic underneath
The SaaS interface revolution
For ten years, SaaS competed on UI/UX. For the next ten, it may compete on API/CLI friendliness to AI.
Built In reported AI has wiped $1T from SaaS market caps. Gartner projects that by 2030 at least 40% of enterprise SaaS spend shifts to usage / agent / outcome pricing — when agents replace humans operating software, per-seat pricing collapses. Databricks' 2026 survey: multi-agent system adoption up 327% in four months. Nvidia GTC 2026's core thesis: the agent era will be bigger than the model era.
Enterprise SaaS is transitioning from apps built for humans to platforms built for humans and AIs simultaneously. The CLI isn't a retreat to command-line nostalgia. It's the AI era's redefinition of interface.
"Built for agents" as a product philosophy
Karpathy's February 2026 advice: export docs as Markdown, write Skill files, make sure CLIs are agent-usable. Feishu nailed all three.
Compare Feishu's and DingTalk's READMEs — both declare "built for humans and AI agents" / "designed for both human users and AI agent scenarios." Not coincidence. Industry consensus, expressed in parallel.
DingTalk's discovery-driven pipeline goes further — the command list isn't even hard-coded, it's dynamically pulled from the MCP registry. Add a feature on the product side; the agent side picks it up automatically. Google's --sanitize flag is a reminder: when products are operated by agents, the security model has to evolve too.
A new competitive axis
"Whose product is easier to operate from AI" is the new competitive dimension. Products that integrate seamlessly with agents will win. Products that don't will be skipped in the next generation of workflows. The SaaS industry is becoming middleware — the end-user product becomes an agent backend service. The CLI is the technical interface for that transition.
Prediction: headless apps (no GUI, embedded inside agent IDEs) will proliferate.
A few unhedged takes
toD may stop existing. There's only toA.
toB, toC, toD (to Developer) — SaaS's taxonomy for ten years. But if agents become software's primary operators, API / SDK / docs design isn't aimed at human developers any more. It's aimed at agents.
Feishu Lark CLI's 19 Skill files aren't written for humans. They're written for Claude Code. DingTalk's intent decision tree isn't there to help people understand the API. It's there to help agents understand user intent. Google's --sanitize isn't defending against humans. It's defending against prompt injection.
If this trend keeps going, SDK package names might end in -agent-sdk. Quick Start in READMEs stops teaching humans and starts teaching agents. Your competitor isn't a rival SaaS product. It's a rival SaaS product whose CLI agents prefer using.
DAU might need redefining
If a user calls your CLI 200 times a day through agents but never opens your app — is that user a daily active? If your investors are still measuring you with GUI-era metrics (MAU, DAU, retention), they might be holding the wrong ruler against a new world. Future measures: number of agents calling your CLI, daily API call volume, agent task success rate, percentage of your error messages that agents resolved without human intervention.
GUI won't die. It becomes the result-display layer.
Human speaks. Agent uses the CLI. Result renders in the GUI. Feishu's Wake Word already does this — in a meeting, you say one sentence; the agent uses the CLI to operate Feishu; you see the finished doc in the Feishu app. You operate via voice; the agent operates via CLI; the GUI displays the outcome.
Three layers each in their proper place: natural language is the human interface, CLI is the agent interface, GUI is the result interface.
SaaS without a CLI will be routed around
When an agent needs to complete a cross-product workflow — check calendar, send a message, update a sheet, write a doc — it preferentially picks products with a CLI or MCP Server. No CLI? The agent either uses Computer Use to simulate mouse clicks (slow, fragile, expensive), or routes around your product entirely.
That's why Feishu and DingTalk shipped CLIs simultaneously. Not because developers asked. Because agents need them. Whoever lays the interface first claims a spot in the agent's default toolchain. It's a user-acquisition war — except this time the user is an AI.
The CLI renaissance isn't nostalgia. Computers started on the command line. GUIs gave ordinary people access. Now the CLI is back — not for humans, for AIs. Human-computer interaction has traced a spiral: command line → GUI → command line (for AI) → next? Probably the three-layer arrangement: humans speak, agents use the CLI, the GUI displays the outcome. Looking back from there, March 2026's simultaneous Feishu / DingTalk / Google CLI launches will mark the start of the transition.