
As generative AI becomes better and better at writing code, one of the positive trends I am seeing unfold before my eyes is the rise of test coverage.
It was not too long ago that it was writing crappy tests.
These days, with agents, it will write crappy tests, run them, and iteratively improve the tests until they pass. If you dare to look under the hood, it will make changes, then call itself stupid, saying, “of course that is not meant to work,” and then keep trying different things until it finds something that works.
What is the result? Everyone’s PRs have 100% test coverage with passing crappy tests.
What is wrong with passing crappy tests? What’s so crappy about them? And they are, after all, tests, right? More tests are better than no tests, right?
Let me start by answering the easiest question. The answer is no. More tests are not always better.
This is a bit hard to see at first, especially if you have worked in code-coverage-poor codebases all your career.
You ship something. Oops, something else broke. If only there had been a test for that thing, you would have known about it before shipping.
So you fill your codebase with tests. Run them before commits. Run them in CI before merging. We want more and more and more tests, right?
Half your feedback on PRs is about adding more tests. You have set up special rules whereby PRs can’t reduce test coverage. I am sure there is more, but you get the gist.
If you are unfortunate enough, like me, you have worked in the opposite environment as well. For example, codebases with almost 100% test coverage, written by yours truly.
Every class or module has unit tests. And then there are integration tests, and system tests, and end-to-end tests.
Sounds like a dream come true, right?
Well, it is not. It is a nightmare.
What happens when you have to make a simple business rule change? You have to update half a dozen test beds.
It only gets worse from there.
What happens when you have to do a simple refactor? Your need to update tests grows exponentially with the size of your change.
It’s only then that you truly realise that tests are also code. More code means more bugs and more maintenance. It means technical debt and slower development.
So you start to question the value of tests. You start to think about testing strategy and good testing practices. You start to think about what to test and what not to test.
But you don’t get there without writing a lot of crappy tests for years and years.
How many people will have this level of experience with testing? And what are the chances that your favourite AI model’s training data includes tests written only by experienced testers?
Funny, isn’t it? You thought you were writing tests to reduce technical debt. When you overdo it, they become a form of technical debt themselves.
Well, the problem is that they are truly crappy tests.
You ask it to write tests for a piece of code, and it will test everything. Go try it now.
There will be tests for the happy paths, the sad paths, the edge cases, forgotten debug statements and other pointless side effects, and even subtle bugs you introduced that you haven’t thought through properly. How is generative AI supposed to know if it's a bug or a feature?
To top it off, how many of those tests are actually meaningful — testing what really matters?
There is a reason why TDD worked as well as it did and became as popular as it did. It forces people not to look at the implementation first, but to think about what needs to be done.
You aren’t going to get that with your AI-generated code and attached PCTs.
Maybe we should hand-write all our tests?
Hell no. Those days are long over.
The solution to the AI problem at this point is more AI. We need to prompt better.
Here are some ways I’ve seen tests get derailed, and how to fix them:
| Problem | Solution |
|---|---|
| Tests are too broad | Ask agent to write tests focused on requirements |
| Tests are too focused on implementation | Tell agent to not test specific implementation details (e.g. logging) |
| Tests are not meaningful | Ask agent to write tests that cover the most important scenarios and edge cases |
| Tests are too verbose | Ask agent to write shorter tests |
| Tests are missing key scenarios | Ask agent to add specific scenarios |
| Over tested | Figure out a good testing strategy, put that into your AGENTS.md |