The Well Tested Programmer: January 2015

Wednesday, January 21, 2015

Can we take lessons from mechanical engineering?

I have to preface this with the note that I'm not a mechanical engineer. I know a little about the subject, but not a whole lot. If you are a real mechanical engineer, please leave comments on what I have wrong. Otherwise take my illustration with a grain of salt.

Figuring out what seams to test at is hard on a complex project. Testing the completed project is too expensive, but if you test less than the full product you risk that your tests don't reflect how the parts actually work together. Can we look to other fields for help? To answer this question I looked a little a mechanical engineering to see what they do.

Probably the best description for building complex mechanical systems is from Richard Feynman's appendex to the Challenger disaster

The usual way that such engines are designed (for military or civilian aircraft) may be called the component system, or bottom-up design. First it is necessary to thoroughly understand the properties and limitations of the materials to be used (for turbine blades, for example), and tests are begun in experimental rigs to determine those. With this knowledge larger component parts (such as bearings) are designed and tested individually. As deficiencies and design errors are noted they are corrected and verified with further testing. Since one tests only parts at a time these tests and modifications are not overly expensive. Finally one works up to the final design of the entire engine, to the necessary specifications. There is a good chance, by this time that the engine will generally succeed, or that any failures are easily isolated and analyzed because the failure modes, limitations of materials, etc., are so well understood. There is a very good chance that the modifications to the engine to get around the final difficulties are not very hard to make, for most of the serious problems have already been discovered and dealt with in the earlier, less expensive, stages of the process.

There is of course more to design than can be expressed in one paragraph, but that gives us something to work with.

The first test mechanical engineering does is the properties of the material in question. In programming we start with standard libraries and algorithms and test them in isolation. We often test situations that cannot happen in the real world. We document the limitations. Unit tests do this very well. If you switch a turbine blade from zinc to aluminum, for the most part your same tests will pass, but some edge cases (max RPM before it explodes) will change, and we can predict with reasonable reliability what those changes are. Likewise when you switch from bubble sort to merge sort most things will work, but some edge cases (is the sort stable) will change.

One digression needs to be made here: mechanical engineering has margin factors. Material tests is generally for an ideal case of the material as formed in a laboratory. In the real world of manufacturing the same quality might not be obtainable - for example there might be "bubbles" or "cracks" in the real part. (for materials prone to this, that is part of the list of properties). Some of this is solved by inspecting parts in manufacturing, but some of it is solved by specifying parts that in theory can handle more stress than they will actually see. This margin is adjusted depending on needs, large expensive parts will in general have minimal factors, while parts that could kill someone will often be three times "stronger" than required just in case. I don't know of an analog for programing.

From there they create larger components, and assemble those components into larger and larger components. Programs are also made of components, which are then put together into larger and larger parts. In both cases you eventually arrive at a whole. So long as only minor changes are required there is no problem, as long as the parts fit together you can change the pieces inside without problem, just test that part and all the parts higher up.

The most important lesson is if your change to one part changes the way it connects with the next component, you also need to re-design that component to fit, which means a lot of parts need to be retested. Mechanical engineers have a concept of interchangeable parts. I upgraded the clutch on my car with one from a model 4 years older for a different engine: the upgrade just bolted right to my engine and transmission and so the change was fairly simple even through large parts of my car had been re-designed.

In programming we have figured out how to substitute simple algorithms like sort and a few containers. However we don't yet have a concept of how to substitute larger pieces in general. There are plug in architectures, but they are plug in only at a few levels. You can run many different programs, but the programs themselves are generally either monolithic, or only allow a few areas of change. We can trade out device drivers, and a few other areas, but those tend to be places where our program touches something external. What is your "clutch" that is a part of your program, yet can be traded out? I'm not sure if this is actually a valid question though: my son's toys generally have one changeable part: the battery. Maybe most software only has a few plug ins because changing plug-ins is not useful for most software.

Lets go back to Richard Feynman's description. I have a problem with his approach. He says you start with a turbine blade, but why are we using a turbine in the first place? Those model rockets I made as a kid didn't have one. Of course my model rocket used a different fuel source, but that is the point, before you can decide you need turbine at all you need to know what fuel you will use, and that is top down design not bottom up. You need to know a lot of your design before hand: a turbine that won't fit is a design disaster even if it works well in isolation. This is exactly the same problem I'm facing in programing: eventually in a large system you will design two components that don't quite fit together right, and fixing it means you have to do major work to one or both parts.

In conclusion, Mechanical Engineering appears to face many of the problems we do. There is no silver bullet though: engineering is a hard problem an you have to make compromises, refine guesses and so on - until you get something that works.

Tuesday, January 6, 2015

Are integration tests the answer?

I have been picking on unit tests a lot lately. The obvious question is do I think that integration testing the answer?

Before I can answer this, we need a definition of integration test. Just like unit test, the definition of integration test goes back to long before automated tests. A integration test is any test that combines two or more units in a single test, with the purpose of testing interaction between units. Many other authors have attempted to redefine integration tests to something that makes more sense in an automation test world.

Back to the question, what about more integration tests? The standard answer is no: when an integration test fails there are many possible reasons, which means you waste time trying to figure out where things broke. It is agreed that when a test fails you should know exactly what part of code broke. Since an integration test covers a large part of the code the failure could be anywhere.

I question that answer. Sure in theory code could break for many reasons. However in the real world there is exactly one reason a test failed: the code you have touched in the last minute broke something. The point of automated tests is we run them all the time - several times a minute is the goal, and once a minute is common. Even the fastest typists cannot write much code in a minute: this leaves a tiny number of places to look for the failure. If a large integration test breaks you already have the root cause isolated to a couple lines of code. As a bonus that area is in your short term memory! (sometimes the solution is changing code elsewhere which is hard, but where the problem was introduced is obvious)

Unfortunately there are other problems with integration tests that are also used as reasons not to write them. These reasons are valid, and you need to understand the tradeoffs in detail before you write any tests.

The first problem with integration tests is they tend to run long. If you cannot run your entire test suite in 10 seconds (or less!) you need to do something. I might write about this latter, but a short (probably incomplete) summary of things to do. Use your build system to only run tests that actually test the code you change. Profile your tests and make them run faster. Split them into suites that can run in parallel. Split them into parts with a scheme where some run all the time, some less often. Use these tricks to get your test times down. There is one more option that deserves discussion: test smaller areas of code, this can get your test times down - at the expense of all the problems of unit tests.

A second problem is integration tests are fragile because of time. Time is easy to isolate in units - most of which don't use time anyway, but the larger the integration the more likely it is that something will fail because of time issues. I outlined a potential solution in my testing timers post, but this may not work for you.

Requirements change over time. This change is more likely to hit integration tests because you are testing a large part of the system. When your tests are tiny, there is correspondingly only a few possible ways the test can break. Larger tests have a larger surface to break when something changes. Thus integration tests are more subject to change. This is not always bad: sometimes a new feature cannot work with some existing feature, and the failing test is the first time anyone realizes the subtle I reason why. Failing tests are often a sign you need a conversation with the business analysts to understand what should happen.

An important variation of the above, the user interface is likely to change often. When you have a feature working you are unlikely to change the code behind it. However the UI for feature not only has to let you use the feature, it also needs to look nice. Look nice is subjective style which changes over time. If your tests all depend on looking for a particular shade of yellow then every time tastes change, a bunch of tests need to change. A partial solution to this is get a UI test framework that knows about the objects on the screen, instead of looking for pixel positions and colors. The objects will change much less often, but even this isn't enough, a widget might move to a whole new screen which again can break a lot of tests.

Fragile can also mean the integration test doesn't inject test doubles like a unit test would. It is very useful to write these tests: the only way to know if you are using an API correctly is to use it. However anytime an API is used instead of a test double you take the risk that something real might/might not be there that breaks the test. A test that needs a special environment can be useful to ensure that your code works in that environment, but it also means anyone without that environment cannot run that test this is a tradeoff you need to evaluate yourself.

Perhaps the biggest problem with integration tests is sometimes you know an error is possible, but not how to create it. For example, filling your disk with data just to test disk full errors isn't very friendly. Worse there may not be a way to ensure the disk fills up at the right time leaving some error paths not tested. This is only one of the many possible errors you can get in a real world system that you need to handle, but probably cannot simulate well without test doubles.

The above is probably not even a complete list.

So do I think you should write integration tests? No, but only because the definition is too tied to the manual testing/unit testing world. What we need is something like integration tests, but without the above problems. There is no silver bullet here: at some time, no matter which testing philosophy you use, you will encounter a limit where you need to examine trade offs and decide what price to pay.