Claude Doesn't Need to Think That Hard About 'ok'

On February 17th, Anthropic killed OAuth login for third-party tools like OpenClaw and TinyClaw.

Before that, both agents ran under my Claude subscription — effectively free. After that, every token cost real money. TinyClaw was running Opus 4.6 on literally every message. Status pings. One-word replies. "ok." "do it." "yea." All going to a $5/million-input, $25/million-output model.

That's 5x more expensive than Haiku on both input and output. For a one-word reply, that adds up fast.

Two days later I dropped a PRD into a session and asked Claude to analyze every prompt I'd ever sent — across Claude Code, OpenClaw, and TinyClaw — to find patterns I could use to build a routing layer. The doc was titled "Selective Reasoning Escalation."

Version one was not the MCP

Claude pulled 505 Claude Code sessions, 2,097 OpenClaw messages, 1,328 TinyClaw log entries.

The pattern was obvious once you saw it. My actual cheap prompts: "hello?", "status", "ok", "yea", "wtf", "do it", "go with option a". Short, no verb, conversational. My expensive prompts: architecture questions, debugging, system design.

So version one was a keyword-based TypeScript router wired directly into TinyClaw's invoke.ts. Word count, verb detection, complexity signals. Three tiers: cheap, medium, premium. It passed 15 tests and went live that same day.

That wasn't ThinkGate. That was duct tape.

The actual insight

A few hours later I was looking at the rule-based router and it already felt wrong. Keyword matching breaks the moment someone asks something unusual. You're writing rules forever.

The insight that changed it: what if Claude classified the request before answering it?

Not me maintaining a rules engine. Not manual tags. Haiku — the cheapest, fastest model — reads your prompt and decides how hard Claude needs to think. Claude decides for Claude.

Incoming prompt
      ↓
  Haiku (~200ms, ~$0.0001)
  "Is this fast / think / ultrathink?"
      ↓
Route to the right effort level

It's self-referential and it works. Haiku is genuinely good at classifying complexity. One call, almost zero cost, and you never have to think about model settings again.

Building it

I wanted this as an MCP server so it would work everywhere — Claude Desktop, Claude Code, any agent framework. Install once, done.

The first live test failed immediately. Haiku was wrapping its JSON response in a code fence:

```json
{"tier": "think", ...}
```

The parser choked. Fixed with two lines of stripping before the JSON parse. Classic.

Then I tested all three tiers:

what is 2+2 → fast, no thinking
review this function for bugs → think, medium effort
design a distributed rate limiter at 10M rps → ultrathink, max effort

All three hit correctly. That was the moment it went from theoretical to real.

The name

I had it as mcp-thinking-router and it was fine but forgettable. Claude ran a naming analysis. The options were automode, mindrouter, cerebro, thinkgate.

ThinkGate stuck because the metaphor is right. It's the gate between your prompt and how hard Claude thinks. Short enough to say out loud. Distinct enough to own.

npm in 2026 is a disaster

The publish took longer than the build.

Classic tokens are dead as of December 9th, 2025. Automation tokens — dead too. The only path to bypassing 2FA for a non-interactive publish is a Granular Access Token with "Bypass 2FA for noninteractive automated workflows" explicitly checked. That setting isn't obvious. I went through three token types before finding it.

Then the MCP registry submission failed on a 100-character description limit and a version mismatch between the server.json and npm. Fixed both, resubmitted.

mcp-thinkgate@0.1.1 added a rule-based fallback so you don't need an API key at all — if you have one, you get AI classification; if you don't, you get smart keyword heuristics. No friction on install.

Why it matters to me

If you're using Claude's API, you're either paying for extended thinking on questions that don't need it, or skipping it entirely on questions that do. The mental overhead of "should I turn on thinking for this?" is real, and it's easy to just pick one setting and leave it.

ThinkGate removes the decision. You ask your question. The right amount of reasoning happens automatically.

That's the direction all of this should go — the mechanics disappear and the work becomes the thing.

Repo: github.com/tjp2021/mcp-thinkgate