Every Item on This Checklist Exists Because Something Broke

My AI told me all tests pass. It was right. Every test I had written came back green.

The problem was what I hadn't written. Forty percent of my code had zero tests. The kill path for the daemon. The CLI commands. The main loop. All untested. All shipped with confidence.

That was February 2026, on a Go CLI tool I was building. It was the third time in two months I'd been burned by the gap between "tests pass" and "this is ready."

The pattern

It kept happening.

My server gateway crashed 67 times in three days. The Node.js memory limit was set higher than the operating system's memory ceiling. Every time the process tried to use the memory it was told it had, the OS killed it. Restarted. Killed. Restarted. Sixty-seven times. The fix was making one number smaller than the other. The correct values were in the documentation I hadn't read.

I built a function called awardScanXP in my mobile app. Wrote it. Tested it in isolation. Tests passed. Shipped it. The function was never called from any screen. It sat there, perfectly tested, doing nothing.

I built a restart webhook for my server. In a single session, I shipped a security token in the URL, wrote unvalidated user input into config files, and used GET requests to mutate server state. Every one of these is a well-documented security mistake. I found them all after the fact.

What I learned

AI coding assistants are confident, not thorough.

When Claude tells me "all tests pass," it is telling the truth. But "all tests pass" only means the tests that exist are green. It says nothing about the tests that should exist but don't. It says nothing about the function that was written but never wired up. It says nothing about the config value that contradicts another config value three files away.

The AI is not lying. It is answering a narrow question accurately. I was treating a narrow answer as a comprehensive one.

The plane that was too complex to remember

In 1935, Boeing's Model 299 prototype crashed during an Army Air Corps evaluation at Wright Airfield in Ohio. The pilot forgot to release a locking mechanism on the elevator controls. The plane climbed to three hundred feet, stalled, and crashed. Two of the five crew members died.

The plane was not defective. The investigation concluded it had more controls, more switches, more steps than any aircraft before it. Too complex for memory alone.

Boeing's response was not to simplify the plane. They created the first standardized pre-flight checklist. A document listing every step a pilot needed to complete before takeoff. Not because pilots were incompetent, but because the system had outgrown what any one person could reliably hold in their head.

With that checklist, pilots flew the production successor 1.8 million miles without an accident. This story is told in detail in Atul Gawande's The Checklist Manifesto.

I think about it every time my AI assistant confidently ships broken code. The problem is not that the tool is bad. The problem is that the system is too complex for confidence alone.

Starting the list

After the third or fourth failure, I started writing things down. Not a lessons-learned document. Not a retrospective. Just a list.

Each item was one line. What to check. Why. Which project. The date.

"Map every exported function to a test file. No test, flag it. From devreap, February 2026."

"Read the project docs fully before writing code. From the gateway crash, February 2026. 67 crashes because the answer was already written down."

"For every new function, name the call site before you write the first line. From LooksMaxx, February 2026. awardScanXP written, tested, never called."

"Grep committed files for API key patterns before pushing. From LooksMaxx, when I pushed a key in eas.json."

The list grew. Every failure added a line.

The incidents

67 crashes in three days

The gateway ran on a small server. Node.js was configured to use up to a certain amount of memory. The operating system was configured to kill any process that exceeded a lower amount. These two numbers contradicted each other:

# systemd service file
MemoryHigh=2400M

# Node.js flag
--max-old-space-size=3000

Every time Node tried to use the memory it was told it could have, the OS killed it.

The fix was changing one number. The documentation said exactly which number and what to set it to. Nobody read the docs before writing the config.

The checklist item that came from it: read the project documentation before writing any code. Not skim it. Read it.

The function that did nothing

In my mobile app, I added a feature to award experience points when users completed a scan. I wrote the function. I wrote tests for it. The tests passed. I moved on.

Weeks later I realized the function was never called from any screen. The feature did not exist in the running app. The code was there. The tests were green. The wiring was not.

The checklist item: for every new function, name the call site before you write the first line. If you cannot name where it will be called, you are building dead code.

Three security failures in one afternoon

The restart webhook was a small Python script that let me restart my server from my phone. In one session, three failures shipped together.

The authentication token was in the URL. Anyone who saw the URL had full access.
User input was written directly into config files with no validation.
The restart action ran on a GET request, meaning a browser prefetch or link preview could trigger a server restart.

Every one of these is a textbook mistake. I shipped all three in the same afternoon and found them later. The checklist item that came from this is the last on the list: before you push, do a senior architect review. Ask yourself whether an experienced engineer who reads every line would be satisfied. If not, fix it first.

The key in the repo

I pushed an API key to GitHub inside a configuration file. Not a .env file. Not a secrets manager. A regular JSON config that got committed with everything else. The checklist item: grep every staged commit for patterns like sk_, pk_, ghp_, and Bearer before pushing.

The gate

A checklist you can forget to run is not a checklist. It is a suggestion.

So the list became a shell script. The script runs before every deploy. It checks what it can automatically: grepping for API keys in committed files, verifying that every exported function has a test, scanning for empty catch blocks. For items it cannot check automatically, it prompts. Did you read the docs? Did you verify the environment config? Did you do the architect review?

If any check fails, the deploy is blocked. Not warned. Blocked.

This is the important part. The checklist is not a reminder to be careful. It is a gate that prevents deployment until the checks are done. The same way a pre-flight checklist is not a suggestion to maybe check the elevator controls.

Three phases

The checklist has three phases.

Phase	When	What
Pre-engineering	Before writing any code	Read the docs. Define what "done" means. Name the call sites for new functions.
Pre-testing	Before saying "tests pass"	Map every function to a test. Verify call site wiring. Check error paths. State the coverage number.
Pre-deployment	Before pushing	Full local build. Full test suite. Secret scan. Environment config verification. Senior architect review.

Every item on the list has a date. Every item has a project name. Every item traces to a specific failure that actually happened.

Start yours

You do not need my checklist. You need your own.

The next time something breaks, do not just fix it. Add one line to a file. What to check. Why. Where the failure came from.

After a few months, you will have a document that is more valuable to you than any best practices guide you could find. Because every item on it is something that actually burned you. Not theoretical. Not aspirational. Real.

The aviation industry learned this in 1935. A checklist born from a fatal crash let pilots fly the same aircraft 1.8 million miles without repeating it. Not because they got better at remembering. Because they stopped relying on memory.

If you are using AI to write code, you need the same discipline. Not because the AI is bad. Because the system is too complex for confidence alone.