The Well Tested Programmer: What is Code Coverage worth?

I've run across a few people say code coverage is the reason to write automated tests. When you are done you can run some tool and see that you have 92.5385% coverage (as if all those digits are statistically significant). I have always questioned this claim: numbers are meaningless.

Nobody has actually told me why I should care about the number. Oh sure, we can all agree that 17% looks pretty bad, and 95% looks pretty good, but so what. If you are that 17% what are you going to do about it? I know a number of successful products that seem pretty stable that are much worse than 20% - in fact nearly much every program up until 2005 has 0 coverage and nobody knew to care. I also know of code with > 90% coverage that had (has?) significant bugs.

People who use code coverage tell me that it is useful when:

You have a new team member. Looking at coverage for the code he creates you know if he is creating tests. You can have a conversation about team expectations if test coverage is not near to what the rest of the team is doing.

You think you are doing TDD - if you see less than 100% coverage it means you didn't TDD that code. These are the places to go back and delete the code until you write the failing test case that requires it.

You have a legacy system with some tests. By considering coverage and relative risk you can decide which areas to put technical focus on first. Pragmatically you know that when working with legacy code you cannot fix everything today (customers won't buy technically better, they buy new features), so you need to prioritize. You should be able to get some time/money from the business for technical work (if you can't you have other problems): coverage is an input in the decision of what to work on with that time.

I'll be honest, I've only heard of teams successfully using coverage for the above. I've never seen any of that in a project I've been one. By contrast I have seen all of the following ways that code coverage can be bad. Code coverage is dangerous if:

You think coverage code is a sign of quality. It is very easy to write tests that cover a line of code without actually testing that the line of code works. The following trivial example gets 100% coverage, but the code is wrong.

int add(int a, int b) { return 1;}

TEST(add) { add(2,2);}

You assign a target number. This is partially related to the above, people will sometimes write bad tests just to get the target. Ignoring that, some code needs more testing than others. Some code just cannot be usefully tested - multi-threaded code may have a bunch of untested mutexes. In a compiled language Getters and Setters don't need testing - how can they possibly fail. In the first case you use careful code review because nobody has figured out how to write useful tests, in the second nobody cares because the code can't possible be wrong. On the other hand, for single threaded business logic classes 100% code coverage shouldn't be hard, and so you should go above the target.

Your boss knows what the numbers are. This is a variation of both of the above with a twist worth noting: because the boss is looking you should expect to see coverage numbers in your yearly reviews, and they might affect your pay. While not morally justified, it is fully understandable why you would write extra "tests" just to ensure you are exceeding his target coverage numbers even if the tests are of no value.

Given that I've never seen a good use for coverage, I have to ask why we bother to measure it. I would advise you not to measure coverage on a YAGNI basis: if you ever do decide you will use it usefully, you can measure it latter.

The Well Tested Programmer

Wednesday, November 26, 2014

What is Code Coverage worth?

No comments:

Post a Comment