It’s not a bug, it’s a defect, stupid!
  (Part 2)

Today in the field of software development, quality assurance is one of the most misinterpreted terms. It’s a kinda history repeating or let’s put it this: It seems we find ourselves at the point where we started some 50 years ago.

Flashback: 50 years ago?! What was then?

Well, 50 years ago, I was a two year old toddler. Watches, cameras and transistor radios were “expensive” consumer products, just like mobile phones are today. From my childhood, I remember that many of these high value consumer products had an oval golden “Passed” sticker, which indicated, that the product you held in your hands had passed the final quality control which was a functional test. The produce that didn’t function properly was discarded while only the well functioning devices were packed and shipped.

The customers who bought these, mostly Japanese products, relied on these “Passed”-labels. It told them that the product functioned.

This process was fairly inefficient. Tons and tons of precious products had to be scrapped because they were defective. Depending on the product, the waste was in the single-digit percentage, sometimes even in the double-digit percentage.

Testing a product is to find defect after the fact – after it already happened.
Thus the defect voids a substantial part of your production capacity. You’ve spent a lot of effort into production and subsequently you invest a lot to iron out the defect.

And what is the fact? Well, the fact was that you’ve applied a production process that made a defect possible.

When you build a complex product, it sometimes becomes very difficult to repair a defect and sometimes it even becomes impossible, so you have to scrap the whole product. Well, this is a pretty unsatisfactory situation.

There are several reasons why it can become unacceptable that defects happen in the first place:

  • they cause damage to your customers,
  • they damage your customer’s trust in your company
  • they have rendered production capacity invalid
  • they cause additional effort to be corrected

Which hall sums up to cost, respectively “return on net asset”.

  • Your quality issues bring a clear advantage to any competitor who produces fewer defects then you.

In the 1970s and the following decades, industrial leaders and scientist understood this problem very well. They reached out to research and understand the nature of a defect and they found a few very important common places that appear to be forgotten in mainstream software development:

  1. When you find a defect, there is a root cause which lead to this defect. Eliminating this root cause, theoretically stops this defect from ever happening again
  2. When you test your products, you only find the defects you know about, but you can’t detect the defects that haven’t come to your knowledge, yet.
  3. Defects have a social character – they tend to gather at particular places and there are several reasons for that. For instance a risky development or production strategy may cause a whole range of defects or a poorly trained engineer produces defects in a field that covers with his particular shortcoming……

So after a defect popped up in the field, after a product has been shipped, you cannot prevent this particular defect any more, but you could prevent it in future products and there are two strategies.

  1. You can test your product for this particular defect before you ship it. This may serve as a band aid for the time being, but if you do this continuously it will cost you a lot of resources.
  2. You research how this defect could happen in the first place, you research the root cause and eliminate that one. After that, the defect would never occur again and you will never be able to detect it again in your tests, finally asking yourself, why you are testing for a defect that never occurs.

“b” is what happened in Japan with the final inspection of the produce. After they fixed design errors and problematic steps during manufacturing, suddenly all cameras passed the final inspection without a defect detected and financial controllers rightfully questioned the meaning of a final inspection. Finally in the late 1980s it was decided to abolish the final inspections. The number of defects detected had fallen below 1 in 6 million (6σ).

Testing had become uneconomic!

The strategy of researching the root cause required a comparatively high effort for a reasonably short time and that difference in time made it much more cost effective compared to final inspections.

The lesson is: Avoiding a defect is much cheaper than fixing it.

When you find a defect during testing, you didn’t prevent it from happening, you just found it!

That doesn’t mean that testing is obsolete,
BUT
testing for known defects, in the sense of defects that already have occurred in the field, to some extent is a kind of nonsensical.

This is where Requirements-Engineering comes into play.

Requirements Engineering as source for Test Design.

I’m a software engineer and I tend to see things from this angle, but I believe that this can be transferred to many other fields of engineering.

So when you start to elicit the features and functions of a product, in today’s engineering world, you start defining features and from there you derive high level functionality

  • functional requirements
  • non-functional requirements
  • constraints

and from there you go down to very detailed technical specifications…..

At this point some of you may feel reminded to the V-Model

which perfectly can be viewed as an aspect of the Capability Maturity Model (CMM respectively CMMi), but in the real world, there are iterative processes in place. No serious developer just never tests the modules he develops. So with every development step accomplished a developer runs a test to see if this functions properly and with every step forward, these tests become more elaborate. Developing these tests can be a substantial part of the work. In the recent past I’ve seen a few wiseheimers publishing articles about this process which every reasonable developer implements on his/her own. (Pleeease, don’t prove me wrong and don’t tell me that you are not doing this!)

This test suite that is developed by the way during the course of software development, serves several purposes:

  • it improves the reliability of the code
  • it makes sure that the development can be tested at all

During these first steps of development we end up with a lot of new questions, which say nothing less but that the Requirements, the UseCases, the User-Stories, the System Analysis and finally the system specifications may be incomplete and/or contradictory and need to be revised. So whenever a software engineer comes up with a question, he MUST stop developing and first clarify this question.

This brings us to a more refined version of the above shown model, the W-Model

You probably already have seen that the steps steps in each model (V and W) are interlinked vertically and horizontally and you should understand this as a feedback process, which should be implemented in the development process. A test shall be specified from the system specification. Let’s look at the lower level, the level of system-detail-specification. Here it very often can become pretty mathematical. You may define a very specific function and you may specify that the values A, B, C and D to be within particular ranges, which brings us to a small but important aspect of set theory, namely the definition of the domain of that function while the definition of the range within the results of that function may fluctuate, is the image of that function.

This is very basic stuff, but it is important to keep track of these elementary issues, because this is how we define the tests for system verification.

This level of tests should be defined regardless of possible errors that happened out there in the field and if these tests are defined properly, a whole class of errors will never be seen in the field. If the software engineer internalized this before writing his own tests, these errors will never be seen by the test engineer and so forth.

Of course, at this point, we hit the issue of meaningful limitation, because these tests may become very time and resource consuming, but all these considerations substantially contribute to a sound understanding of what we are doing. And with that understanding between our ears, we are much less likely to produce stupid errors.

Why I wrote this article

I’ve come to writing this text through an example of a risky strategy which is likely to yield errors. This is code branching and merging.

In many fields of software engineering, we assign particular technical leaders, who in the best case lead groups not more than two or three people. These technical leaders are highly trained and experienced engineers who not only know the theoretical background of engineering but they also know the problem domain, they are dealing with, inside out. This little team or this individual software engineer may be responsible for the development and maintenance of a library or a particular system facility. The overall system also had been designed with these engineering responsibilities in mind.

So, splitting and assigning these engineering tasks is an important aspect of system architecture. During the times when I was doing a lot of software engineering, it was a particularly problematic thing to merge different branches of the same source module. So, it may have been an interesting thing to implement a special customer request and produce a derivative of a source which finally lead to a custom release of a software, but one day, we wanted to merge this together with the main branch because the effort of maintaining two or more branches became too high. But then the mess started! Sometimes the data models differed. Sometimes even the transaction models differed…..etc. These were nightmares and merging two branches became horrifying and often lead to the decision to re-implement the function of that one branch into the main branch and then the side branch was shut down.

The process of merging often lead to errors that only could be ironed out through a rollback to an older version. Branching and merging turned out to be a major source of complicated software defects.

I’ve seen many software development organizations which practically forbid their engineers to produce branches and engineers needed to come up with very good reasons and it had to be clear that this NEVER would have to be merged into the main branch.

In the early 2000s, Linus Torwalds became increasingly puzzled with the situation around the Linux Kernel development. Linux is developed by software engineers around the world and often engineers in many different countries are messing around with the same source module. This was a permanent nightmare. Torwalds and his fellows needed much better tools to control the branches and to merge them. This is when Git was born.

The situation that lead Torwalds to create Git does not apply to most “normal” software  development organizations, where practically all engineers are working in the same office and where access to source modules is assigned and controlled. Nevertheless, young software engineers eagerly jumped on that new train – without need. Since they now had the tool at hands that allowed branching and merging in highly complex engineering environments in the absence of organization, they started branching and merging and viewed as the state of the art.

Without need they gave up the organizational provisions that allowed their more experienced colleagues to happily live without.

This is not about a new technology. It’s about a tool and the reliance upon tools always had a drawback in software engineering, because it caused particularly young engineers to do unnecessary things and compensate the consequences of their action with tools. There are many of these things around and I’ve seen it all to often, that companies tried to do things with tools rather than technology.

Particularly in smaller organizations branching and merging should not occur and if it does, it’s an critical indicator for organizational shortcomings.

These shortcomings should be addressed with processes and not with tools.

Thinking about this issue, I came up with the above article, since branching and merging are one of these constant sources of software defects.