Don’t search cover behind drywall

June 17th, 2006

Before getting up this morning I reached out for an issue of Wired magazine from the pile under my bed.

I read an article on a training facility for US-soldiers going to Iraq; though totally unrelated to software one thing struck me: While praticing the soldiers uses laser-equipment to simulate bullets. As a drywall or any other non-transparent material block light, the soldiers train themselves to simply hide instead of going for cover.

Association chain: cover - coverage -tests, well here we are.

Back to the picture, as a seasoned CS-player I know the difference from first hand (unlike real soldiers I am still able to talk about it). Software is not far off from a game; if you introduce a bug, you can correct it later (errm not in all cases), well at least if your real-estate site crashes, no one get killed.

With test cases it is sometimes the same, they assure you that everything is OK, you might have 98% coverage and all tests pass. Then after release a bug comes from nowhere and takes the system down. Where do these killers from? While you have perhaps validated each single function in isolation, you will never have any reasonable coverage of all possible scenarios that execute them in sequence.

My favorite bug of this kind could be reproduced in the following minimal scenario:

  • Insurance contract live for at minimum 15 years (this implied the 15*12*6 ~ 1000 batch tasks had been applied in the background)
  • 3 uncommon alterations made to the contract where the 2nd got canceled and redone in this period

After I found out what went wrong I had the greatest respect for the tester who conceived this test-case, - until he confessed that it came from a standard scenerio where he made two(!) errors as he went through it.

In fact the bug came from design flaw, for the fix I had to rewrite large portions of the sequencer (>400 classes). Irony here: The sequencer had been one of the few components where we had unit test (It was abck in these C++ days where things like JUnit just came out). Design flaws are rarely found by test-cases as the tests follow the design (unit tests) or duplicate the design (scenarios).

As any design element comes down to some code to implement it, one might think that there could be a test for it. Unfortunatly most of these tests are just “drywall”, they assure you that “covered” your code, but they guard you only against “visible” bugs, the paradox is that the design-elements have to take care of themselves because thats what they had been design for in first place.

The paradox continues, the code that manages to hide itself completly seems to be impregnable, mostly because they are protected by a concrete wall of design-elements that protect it from getting unexpected data ( in my case the actual implementations of the commands in the sequencer where all fine).

Morale: Test-drive code where you can do it meaningfully, design carefully code that will be hard to test.