The Well Tested Programmer: Quality over time.

The post on testing like the shuttle noted that the software used on the shuttle was old. That opens the question: is quality just a matter of time - maintain software for years and eventually you fix most of the bugs? Every year that passes is another year where programmers look at the code again and fix something "odd". Every month is one more chance to finally figure out that rare crash. Does this mean that over time there is less bad code.

Unfortunately, getting data is hard: most companies are secretive about their processes. Even when they are not, it is hard to do real science when you can't even get an apples to apples comparison, much less a true control case. We can look at open source for some hints, but need to be careful to not put too weight on the hints being a truth.

Some large open source projects have found that time works. Linux, Apache, and many others have a reputation of quality. When a few dedicated programers give years of their life to the code; and are allowed enough time to to make things better as opposed to just fixing critical bugs or adding new features: the code quality improves. Those programers start to know the entire system well enough that they are not afraid to make major changes for the better. If they are guided by tests (which they are moving to, though in general they don't have many yet) there is high likelihood that those changes won't introduce new bugs. In the open source world these people are generally the maintainers and have veto power over all contributions. When the right programer is given the power to refuse new money making features it does force the quality of the project to increase.

The open source model has failures as well. OpenSSL for example has been around for a while, and has been in the news recently for some serious bugs. The maintainers don't seem to care about quality, and so they have critical bugs that get worldwide attention. The project has implied that the money to fund the project comes from people who want features and don't care about bug free code, and so they don't worry about major bugs.

From what I can tell, the commercial world is a mix as well. On the one hand, pressures of money and schedule mean that often companies do not give programmers time to make required improvements On the other hand, many companies have learned the hard way that poor code quality is expensive as well. Companies normally say they want to strike a balance: if they don't limit the engineers nothing will get done and the company goes out of business, but if they don't put anything into maintainability the software ends up too expensive to maintain.

The idea of a balance is a good idea. However in my experience the idea is something that often gets lip service, but it isn't necessarily reality. It is easy to say "quality is important", but then day to day decisions prove that quality isn't important. There are always competitive pressures that make management want to get the current work done "quick and dirty" now, then come back "tomorrow" to clean it up. Then when tomorrow comes the next feature is more important. The solution here is easy: management needs to ensure they don't create emergencies from normal situation. Software is always late, quit pretending that is abnormal and manage it.

Even when management isn't directly standing in the way, programmers don't always make things better, instead they put a "band-aide" on which leaves the whole uglier, but gets the job done. Over time this causes the code to decay not get better. Even if the programmer cares, anything but the most trivial changes are hard to get in because there is the (legitimate) worry that the change might break something else.

There is one other part missing: many companies want to treat people like interchangeable parts, hire some contractors when there is work, switch people to a different project when the work is done. When you don't really understand the code in question you can't make an improvement: programmers that don't understand the big picture are likely to destroy a good design they don't understand in their misguided efforts to make things better.

The result of this over years: Eventually the company gives up and does an expensive big rewrite to "fix all the problems".

The question then becomes what is better long term: to start over from scratch every 10 years or so when the code becomes unmaintainable, or to spend extra time and money over those 10 years to keep the code maintainable. The big rewrite is expensive: not only do you pay most of the cost for writing the old system again at tomorrow's post inflation prices, but you also have to pay to maintain the old system until you can switch to the new system. Don't forget that the old system has become hard to maintain so you are paying extra money just for the work it needs. Or you could instead invest a little extra money every year just improving quality, so that things never reach the point where you need to do the big rewrite.

That might make it sound obvious that spending money on quality is the better idea, but this may not be true. Car manufactures tweak their cars every year, but they still have teams start over to do ground up re-designs of their cars because there is only so far tweaks can get them. Those ground up re-designs allow them to take advantage of new processes, styles, and materials that are not possible with the design of the existing car. Likewise in programing starting from the ground up is the only way to re-think many early core decisions. The important point is not to keep quality up so you never have to start over, but that starting over should be a business decision made in advance.

When you read the above your are probably thinking about Joel On Software's advice to never rewrite your software. However his advice is flawed. What is missing from his advice is that if you choose to re-write you need to commit to two fully funded teams for several years, one continuing on with the current code to keep it competitive in the market, the other to do the re-write. If you cannot afford this - very expensive cost - than a re-write is not for you. If you can afford it, it might be right. Over the very long run the rewrite can make you more money than sticking with the current code would. Where Netscape failed was not that they decided to do a re-write, it was that they didn't fund maintenance of the old code until the new was ready. Firefox is now a dominate browser, so while the full rewrite cost Netscape their company, they actually did well in over the long term, and it seems likely that the browser code they had before the re-write would not bring them here. (though one wonders if the situation would be different if Microsoft hadn't also abandoned their browser for many years giving Firefox time to get ahead)

Which model is right for your project? Are you going to maintain the same software year after year, slowly adding features as required, or are you going to start over every once in a while, creating a system that is suddenly better and lets you add more features fast for a time until it decays to an unmaintainable mess?

If you choose the first option, you have to accept that in the short run you may have to be late on useful new features just because they don't fit your quality standards without significant technical work. It also means you keep a few people for many years, always working on the same code. They need plenty of time for technical improvement. Those people need a passion for quality, and the right to make decisions that are wrong in the short term for the long term good.

If you choose the second, you can get to market faster in the short term: make it work now. Avoid refactoring, only make things better when there is no choice. The only parts of your code that are maintainable are the parts that change so often that you have to make them maintainable. As you go along everyone notes the mistakes they made, and the next time around you avoid making them. Then the team making the next version starts with a design that avoids those mistakes. I will add a caveat: guard against the second system effect.

There is one tricky part of the second plan that you need to work on carefully: the transition plan. Since you are planning on giving up on the current software, you need to ensure what when you write the new software your users can transfer over to it. Therefore your data storage formats need to be carefully documented, and you need to test that your data actually follows the format.

This isn't actually an either or choice. Just like in cars they generally use the same engines, you can choose to keep old parts of the system that are working well. In fact in every "ground up redesign" I've worked on, some parts of the old system were kept. When you have complex business logic that works it is often better to re-use that, wrapping it in a new UI and fixing the foundational architecture that is broken. The parts you aim to keep need to be kept maintainable, but not the whole.

Even if you choose the first, all is not well. We have not yet figured out the best way to program. If you started writing a program in 1955 for maintainability you will have real problems because in 1955 we didn't have any useful programing languages (See wikipidia's timeline of languages). If you started a few years latter COBAL probably seemed like a good language, but today we would disagree.

Choice of language isn't the only issue, we have also learned things about writing good programs. Structured programing solved a lot of major problems, that in 1955 we didn't anticipate. Object Oriented programing came next because structured code isn't enough. There are some who believe that functional programming will be next, though it hasn't caught on yet. Our current best practices leave much to be desired, but we don't yet know of anything better.

Then there are project specific issues. Early design decisions often prove to not scale well for some unanticipated need years latter. When those design decisions are the core of how your software is designed there is no way to retrofit without months or even years where the project is not shippable. It is often better to start over if you find yourself in this situation.

History says that you will still need to give up and re-write someday anyway. However with some care you can skip a rewrite, and overall save money. In some cases you can delay that until your project is obsolete and not needed anymore.

Which is right for your project? That is your choice. Only you know your pressures and long term situation that applies to your project.

The Well Tested Programmer

Friday, March 6, 2015

Quality over time.

No comments:

Post a Comment