A History of Quality in Software Engineering
Software quality practices have evolved over six decades. What began as a response to the “software crisis” of the 1960s has grown into collaborative specification techniques that bridge the gap between business and technical teams.
Contents
- Evolution of Quality Approaches
- The Software Crisis (1968)
- Structured Programming (1968)
- Fagan Inspections (1976)
- Cleanroom Software Engineering (1980s)
- Personal Software Process (1990s)
- UML and Design Communication (1997)
- Extreme Programming (1996-1999)
- Test-Driven Development
- Behavior-Driven Development (2006)
- The C4 Model (2011)
- Given-When-Then Format
- User Stories
- Specification by Example (2010s)
- Example Mapping
- TDD vs BDD
- Common Pitfalls
- Key Figures
- Further Reading
Evolution of Quality Approaches
1968-1990s: Quality through PROCESS
The early focus was on disciplined processes to catch defects: Fagan Inspections (formal peer review), Cleanroom (defect prevention), and PSP (individual measurement).
1994-1997: Quality through DESIGN COMMUNICATION
Teams needed shared visual languages. UML unified competing notations into a standard for modeling software systems.
1996-2001: Quality through PRACTICE
Extreme Programming shifted focus to lightweight practices: TDD, pair programming, and continuous integration.
2001: The Agile Manifesto
Seventeen practitioners valued “working software over comprehensive documentation,” creating tension with UML-heavy approaches. Source: Agile Manifesto
2006-2011: Quality through LIGHTWEIGHT COMMUNICATION
Bridging business and technical teams: BDD (natural language specs), C4 Model (just enough diagrams), and Specification by Example (collaborative examples).
PROCESS --------> DESIGN --------> PRACTICE --------> LIGHTWEIGHT
1968-1990s 1994-1997 1996-2001 2006-2011
| | | |
v v v v
Inspections UML XP & TDD BDD & C4
Cleanroom Sequence Pair prog. SbE
PSP diagrams CI Example Map
The Software Crisis and Birth of Software Engineering (1968)
The term “software crisis” was coined at the first NATO Software Engineering Conference in Garmisch, Germany in 1968. The conference, attended by over fifty experts from eleven countries including Edsger Dijkstra, Tony Hoare, and Niklaus Wirth, confronted a growing problem: software projects were consistently over budget, overdue, and unreliable.
As Dijkstra later observed in his 1972 Turing Award lecture:
“The major cause of the software crisis is that the machines have become several orders of magnitude more powerful… as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem.”
The conference deliberately adopted the provocative term “software engineering” to suggest that software development needed the rigor of traditional engineering disciplines. This event marked the beginning of systematic approaches to software quality.
Structured Programming (1968)
That same year, Dijkstra published his famous letter “Go To Statement Considered Harmful” in Communications of the ACM, marking the beginning of structured programming.
Definition: A programming paradigm using block-based control flow instead of arbitrary jumps via goto statements.
The Three Control Structures
Dijkstra showed that any program can be written using just three constructs:
Sequence - Statements execute one after another in order:
read_input()
process_data()
write_output()
Selection - Choose different paths based on a condition:
if user_is_authenticated:
show_dashboard()
else:
show_login_form()
Iteration - Repeat statements while a condition holds:
while items_remaining:
process_next_item()
Core principle: These three control structures are sufficient to express any computable function (the structured program theorem).
Key insight: Dijkstra observed that the quality of a programmer’s code was inversely proportional to the number of gotos used. Code without gotos can more easily be proven correct.
By the end of the 20th century, nearly all programming languages had adopted structured programming constructs. Languages that originally lacked them (FORTRAN, COBOL, BASIC) added support.
Fagan Inspections (1976)
Michael Fagan at IBM developed formal software inspections as a systematic method for finding defects in documents, code, and specifications.
Definition: A formal peer review process with predefined roles, entry/exit criteria, and structured meetings focused solely on defect detection.
Inspection Roles
- Moderator: Leads the inspection and ensures the process is followed
- Reader: Presents the material being inspected to the team
- Author: Created the work product and answers questions about it
- Scribe: Records all defects found during the meeting
The Inspection Process
+----------+ +----------+ +----------+ +----------+ +----------+ +----------+
| Planning |--->| Overview |--->| Prep |--->|Inspection|--->| Rework |--->| Follow-up|
+----------+ +----------+ +----------+ +----------+ +----------+ +----------+
| | | | | |
v v v v v v
Select Author Individual Team finds Author Verify
material, presents review by defects fixes fixes
assign roles context inspectors (NOT solutions) defects
- Planning: Select material to inspect and assign roles
- Overview: Author presents context to the team
- Preparation: Each inspector reviews the material individually
- Inspection meeting: Team identifies defects (not solutions)
- Rework: Author fixes the defects found
- Follow-up: Verify that fixes are correct
Results: IBM reported that inspections located 82% of all errors. The company doubled lines of code shipped while reducing defects per thousand lines by two-thirds. Studies showed 80-90% of defects found with up to 25% resource savings.
Key principle: The inspection meeting finds defects only–solutions come later during rework.
Cleanroom Software Engineering (1980s)
Harlan Mills at IBM developed Cleanroom as a theory-based process for producing software with certifiable reliability levels.
Definition: A software development process emphasizing defect prevention over defect removal, using formal methods and statistical quality control.
Cleanroom Process
+---------------+ +---------------+ +---------------+
| SPECIFICATION | ---> | DEVELOPMENT | ---> | CERTIFICATION |
+---------------+ +---------------+ +---------------+
DEFECT PREVENTION > DEFECT REMOVAL
Traditional Cleanroom
---------- ---------
Code -> Debug -> Test Verify -> Certify
Find & fix defects Prevent defects
Three phases:
- Specification: Requirements analysis, function specification, usage modeling
- Development: Design with correctness verification–developers prove their code is correct through formal reasoning, not debugging
- Certification: Independent statistical testing based on expected usage patterns
Key principle: Developers verify correctness through formal reasoning, not debugging. A separate certification team performs all testing.
Results: IBM’s COBOL/SF tool (85,000 lines of code) showed a ten-fold reduction in defects during testing and five-fold improvement in developer productivity. Only seven errors in three years of production use.
The term “Cleanroom” borrowed from semiconductor manufacturing reflects the focus on preventing contamination (defects) rather than filtering it out later.
Personal Software Process (1990s)
Watts Humphrey at the Software Engineering Institute created PSP to apply process improvement principles to individual developers.
Definition: A structured self-improvement framework where engineers measure and analyze their own performance to reduce defects and improve predictability.
What Developers Track
- Time: Hours spent on each activity (design, code, test, etc.)
- Defects: When injected, when found, and what type
- Size: Lines of code predicted versus actual
“You cannot improve what you do not measure”
Process Levels (Progressive Adoption)
PSP0 --> PSP1 --> PSP2 --> PSP2.1
| | | |
v v v v
Baseline Size & Code & Design
measure- time design templates,
ment estimation reviews verification
- PSP0: Establish baseline measurements (time, defects, size)
- PSP1: Add size and time estimation based on historical data
- PSP2: Add code reviews and design reviews
- PSP2.1: Add design templates and formal verification
Key principle: By tracking personal data, engineers identify their own defect patterns and improve systematically.
Humphrey, known as “the father of software quality,” also created the Capability Maturity Model (CMM) for organizational process improvement.
UML and Design Communication (1997)
While process-focused approaches evolved, another stream addressed quality through visual design communication. The Unified Modeling Language emerged from the “method wars” of the early 1990s, when competing object-oriented notations made it difficult for teams to share designs.
Definition: A standardized visual modeling language for specifying, constructing, and documenting software system artifacts.
The Three Amigos
In 1994-1996, three leading methodologists unified their competing approaches:
- Grady Booch brought the Booch Method, strong in design and construction
- James Rumbaugh brought OMT (Object Modeling Technique), strong in analysis and data systems
- Ivar Jacobson brought OOSE, strong in use cases and requirements capture
They joined at Rational Software and released UML 1.1 in 1997, which was adopted by the Object Management Group (OMG) on November 14, 1997.
UML Diagram Types
UML defines 14 diagram types in two categories:
Structural diagrams (static system view):
- Class, Object, Component, Deployment, Package, Composite Structure, Profile
Behavioral diagrams (dynamic system view):
- Use Case, Activity, State Machine, Sequence, Communication, Interaction Overview, Timing
Sequence Diagrams
Of all UML diagrams, sequence diagrams proved most valuable and survived the agile backlash against heavy documentation. They show how objects interact over time:
+--------+ +--------+ +--------+
| Client | | Server | |Database|
+---+----+ +---+----+ +---+----+
| | |
| 1. request() | |
|------------------>| |
| | 2. query() |
| |------------------>|
| | |
| | 3. results |
| |<------------------|
| | |
| 4. response | |
|<------------------| |
| | |
As Martin Fowler noted: “The primary value of drawing diagrams is communication. Because the purpose is communication, it’s essential to strip away some information so as to clarify other information.”
UML and Agile: The Tension
UML emerged from the “big design up front” tradition–create detailed models before coding. The Agile Manifesto (2001) explicitly valued “working software over comprehensive documentation.”
Many agile teams abandoned UML entirely. Others adopted “agile modeling”–using diagrams for communication without treating them as formal deliverables.
Extreme Programming (1996-1999)
Kent Beck developed Extreme Programming while leading the Chrysler Comprehensive Compensation System (C3) payroll project, starting in March 1996. He refined the methodology with Ward Cunningham and Ron Jeffries, publishing Extreme Programming Explained in October 1999.
Definition: A software development methodology that improves quality and responsiveness to changing requirements through short development cycles, continuous feedback, and close customer collaboration.
XP Values
- Communication: Talk constantly with each other and with the customer
- Simplicity: Build only what is needed now, no speculative features
- Feedback: Test always, release often, review code continuously
- Courage: Refactor aggressively, admit mistakes, discard failing approaches
- Respect: Value every team member’s contribution (added in 2nd edition)
The Original 12 Practices
Planning practices:
- Planning Game: Business and development collaborate each iteration to select and prioritize work based on value and cost estimates.
- Small Releases: Release to production frequently (weeks, not months) so each release delivers concrete business value and enables fast feedback.
- Metaphor: Use a shared story or analogy that everyone understands to guide development and name system components consistently.
- Customer Tests: Customers write acceptance tests that define when a story is complete, providing clear and testable requirements.
Development practices:
- Simple Design: Build the simplest thing that could possibly work, then refactor as understanding grows–no speculative generality.
- Pair Programming: Two programmers work together at one workstation, continuously reviewing each other’s work and sharing knowledge.
- Test-Driven Development: Write a failing test before writing the code that makes it pass, ensuring comprehensive test coverage.
- Refactoring: Continuously improve code structure without changing behavior, keeping the design clean as requirements evolve.
- Continuous Integration: Integrate code into the shared repository multiple times per day with automated builds and tests.
Team practices:
- On-site Customer: A real customer sits with the team full-time to answer questions and provide immediate feedback on decisions.
- Collective Ownership: Anyone can modify any code, requiring coding standards and comprehensive tests to work safely.
- Coding Standards: The team agrees on coding conventions so all code looks familiar and anyone can work on any part.
- 40-Hour Week: Sustainable pace prevents burnout and maintains quality, since tired programmers make more mistakes.
Pair Programming
+--------------------------------------------------+
| WORKSTATION |
| +--------------------------------------------+ |
| | | |
| | shared screen | |
| | | |
| +--------------------------------------------+ |
+--------------------------------------------------+
| |
v v
+-----------+ +-----------+
| DRIVER | | NAVIGATOR |
+-----------+ +-----------+
| Writes | | Reviews |
| code, | <------> | continuously,
| thinks | rotate | thinks |
| tactical | roles | strategic |
+-----------+ +-----------+
Two programmers work together at one workstation. The driver writes code and thinks tactically about the current line. The navigator reviews continuously and thinks strategically about the overall approach. Pairs rotate roles frequently, and partners switch often so knowledge spreads across the team.
Studies showed ~15% more time investment but higher quality and faster knowledge transfer.
Key Practices Explained
On-site Customer: A real customer sits with the development team full-time to answer questions, clarify requirements, and provide immediate feedback. This was radical–previous methodologies treated requirements as documents thrown over a wall.
Planning Game: Business and development collaborate to maximize value. Business writes User Stories on index cards describing desired features. Development estimates effort. Business prioritizes based on value and cost. Planning happens each iteration.
Collective Code Ownership: No individual owns any code. Anyone can modify any part of the system. This requires coding standards and comprehensive tests to work safely.
Continuous Integration: Developers integrate code into a shared repository multiple times per day, with automated builds and tests. This catches integration problems immediately rather than in a painful “integration phase.”
Small Releases: Release to production frequently–weeks, not months. Each release delivers concrete business value. Short cycles mean faster feedback and easier course correction.
Key insight: XP takes practices that work and turns them to “extreme” levels–if testing is good, test constantly; if code review helps, review continuously via pairing; if integration is painful, integrate continuously.
The C3 project was cancelled in February 2000 after Daimler-Benz acquired Chrysler, but XP had already spread. Beck was among the seventeen signatories of the Agile Manifesto in 2001.
Test-Driven Development
TDD emerged as one of XP’s core technical practices but became influential in its own right.
Definition: A development technique where you write a failing test before writing the code that makes it pass, followed by refactoring.
The Red-Green-Refactor Cycle
+-------+
| WRITE |
| TEST |
+---+---+
|
v
+-----------+
| RED | <-- Test fails
| (fail) |
+-----+-----+
|
| write minimal code
v
+-----------+
| GREEN | <-- Test passes
| (pass) |
+-----+-----+
|
| improve structure
v
+-----------+
| REFACTOR | <-- Keep tests green
| |
+-----+-----+
|
+---------> repeat
- Red: Write a test for the next bit of functionality–it should fail
- Green: Write the minimal code needed to make the test pass
- Refactor: Improve the code structure while keeping all tests green
As Martin Fowler notes, the most common mistake is neglecting the third step– skipping refactoring leads to messy code accumulation.
Two primary benefits:
- Self-testing code: implementation only occurs in response to test requirements
- Better design: thinking about interfaces before implementation separates concerns naturally
Theoretical foundation: J.B. Rainsberger grounds TDD in queuing theory–when process B requires reworking process A, efficiency improves by performing part of B before A begins. This eliminates wasteful rework cycles.
The Birth of BDD (2006)
Dan North introduced Behavior-Driven Development as an evolution of TDD that addresses a common problem: developers struggling with where to start testing, what to test, and how much to test at once.
TDD vs BDD Focus
TDD Question: BDD Question:
"How do we test this code?" "What behavior should this system exhibit?"
CODE BEHAVIOR
| |
v v
Implementation User outcomes
details and value
BDD shifts focus from implementation details to behavior and user outcomes. The key insight was applying the same queuing theory principle at the analysis level–implementing features reveals insights about other features.
This practice involves business and technical people writing examples together to establish shared understanding.
The C4 Model (2011)
Simon Brown developed the C4 model between 2006-2011 as a response to teams abandoning architecture diagrams entirely. While agile practitioners rejected heavyweight UML, they still needed ways to communicate system structure.
Definition: A hierarchical approach to visualizing software architecture at four levels of abstraction: Context, Containers, Components, and Code.
The Four Levels
Level 1: CONTEXT Level 2: CONTAINERS
"What is the system?" "What's inside the system?"
+-------------------------+ +-------------------------+
| | | +-----+ +-----+ |
| [Users]--->[System] | | | Web | | API | |
| / \ | | | App | | | |
| v v | | +--+--+ +--+--+ |
| [External] [Mail] | | | | |
| System Service | | v v |
| | | +-------------+ |
+-------------------------+ | | Database | |
For: Everyone | +-------------+ |
(business + technical) +-------------------------+
For: Technical people
Level 3: COMPONENTS Level 4: CODE
"What's inside a container?" "How is component implemented?"
+-------------------------+ +-------------------------+
| API Container | | |
| +--------+ +--------+ | | Class diagrams |
| | Auth | | Order | | | (usually auto- |
| |Handler | |Service | | | generated from |
| +--------+ +--------+ | | source code) |
| \ / | | |
| v v | | Rarely used--too |
| +----------+ | | detailed for most |
| |Repository| | | purposes |
| +----------+ | | |
+-------------------------+ +-------------------------+
For: Developers/architects For: Detailed design
- Level 1 - Context: Shows the system in its environment with users and external systems. Suitable for everyone including non-technical stakeholders.
- Level 2 - Containers: Zooms into the system to show applications, databases, and services. For technical audiences.
- Level 3 - Components: Zooms into a container to show internal components. For developers and architects.
- Level 4 - Code: Class-level diagrams, usually auto-generated. Rarely used as it’s too detailed for most purposes.
Key principle: Good diagrams are about communication, not compliance with a notation standard. Use whatever helps the team understand the system.
Brown has taught the C4 model to over 10,000 people in ~40 countries, reflecting demand for lightweight architecture visualization that agile teams will use.
Given-When-Then Format
The Given-When-Then syntax became the dominant format for expressing specifications. According to Gojko Adzic’s 2020 survey, this format accounts for 71% of usage versus less than 10% for table-based formats.
Structure
- Given - The context or setup (preconditions)
- When - The action or event that triggers the behavior
- Then - The expected outcome
Example:
Given a registered user with valid credentials
When they submit the login form
Then they should see their dashboard
The format won adoption due to a good balance between expressiveness and developer productivity. Its simplicity enabled broader tooling support and better IDE integration.
User Stories and Acceptance Criteria
Dan North’s work on user stories established a template that connects behavior to business value:
As a [role]
I want [feature]
So that [benefit]
Example:
As a homeowner
I want to schedule recurring cleaning appointments
So that I don't have to remember to book each time
This format keeps the focus on who needs what and why, rather than jumping straight to implementation details. Stories serve as placeholders for conversations, not detailed specifications.
Specification by Example (2010s)
Gojko Adzic’s Specification by Example (SbE) formalized the collaborative approach where teams use concrete examples to define acceptance criteria, guide development, and create executable tests.
Key findings from his 2020 survey after 10 years of SbE adoption:
- Teams using examples as acceptance criteria: 22% rated software as “Great” (vs 8% without)
- 47% of teams now define acceptance criteria collaboratively with business
- One-third don’t automate examples–yet automation correlates with 2x quality ratings
Example Mapping
Example Mapping emerged as a structured conversation technique for exploring user stories before development begins. Teams use colored cards to capture different elements of a story.
Card Types
- Yellow card: The user story being discussed
- Blue cards: Rules or acceptance criteria that govern the story
- Green cards: Concrete examples that illustrate each rule
- Red cards: Questions that need answers before development can proceed
Example: Cleaning Scheduling Service
Story (Yellow):
As a homeowner
I want to reschedule a cleaning appointment
So that I can adjust to changes in my calendar
Rule 1 (Blue): Rescheduling must be done at least 24 hours in advance
- Example (Green): Appointment on Friday 2pm, customer reschedules Wednesday at 3pm -> Allowed
- Example (Green): Appointment on Friday 2pm, customer reschedules Thursday at 4pm -> Denied with message “Less than 24 hours notice”
Rule 2 (Blue): Customers can only reschedule to available time slots
- Example (Green): Customer selects Saturday 10am which shows as available -> Appointment moved to Saturday 10am
- Example (Green): Customer selects Saturday 10am which is fully booked -> Slot not shown in available options
Rule 3 (Blue): Rescheduling is free for the first change, then costs $15
- Example (Green): First reschedule of appointment -> No fee charged
- Example (Green): Second reschedule of same appointment -> $15 fee shown before confirmation
Questions (Red):
- What happens if the cleaner cancels? Does that reset the customer’s free reschedule?
- Can customers reschedule to a different cleaner?
- Should we send a confirmation email after rescheduling?
This approach surfaces ambiguity and missing requirements early, when changes are cheapest. If a story has too many red cards, it needs more discovery. If it has too many blue cards, it might need to be split into smaller stories.
TDD vs BDD: Choosing an Approach
Aspect TDD BDD
------ --- ---
Focus Implementation details User behavior
Language Programming language Natural language
Audience Developers Stakeholders + developers
Scope Unit level System interactions
Primary use Code quality/design Requirements communication
Both methodologies write tests before implementing code and create automated test suites.
Use TDD when: prioritizing code quality and design, working in continuous integration environments, or needing rapid developer feedback.
Use BDD when: requiring non-technical stakeholder involvement, building user-centric applications, or dealing with complex business logic needing clear communication.
Common Pitfalls
TDD Pitfalls
- Over-testing minor functions
- Skipping refactoring (the third step)
- Writing code before tests
BDD Pitfalls
- Creating overly vague or excessively detailed scenarios
- Excluding stakeholders from specification conversations
- Failing to automate the examples
Key Figures
See Further Reading for links to their work.
Pre-XP Era
- Edsger Dijkstra: Structured programming, “Go To Statement Considered Harmful” (1968)
- Michael Fagan: Formal software inspections at IBM (1976)
- Harlan Mills: Cleanroom software engineering at IBM (1980s)
- Watts Humphrey: Personal Software Process, CMM, “father of software quality” (1990s)
Design Communication (UML Era)
- Grady Booch: Booch Method, co-creator of UML, one of the “Three Amigos”
- James Rumbaugh: Object Modeling Technique (OMT), co-creator of UML
- Ivar Jacobson: Use cases, OOSE method, co-creator of UML
XP and Agile Era
- Kent Beck: Created XP and TDD at Chrysler C3 project (1996). Author of Extreme Programming Explained and Test-Driven Development. Agile Manifesto signatory
- Ward Cunningham: Co-developed XP practices with Beck. Created the wiki, CRC cards, and FIT testing framework
- Ron Jeffries: XP coach on C3 project, helped codify and spread XP practices. Agile Manifesto signatory
- Martin Fowler: Wrote extensively on refactoring, TDD, patterns, and UML. Agile Manifesto signatory
- Dan North: Introduced BDD in 2006 as evolution of TDD
- Gojko Adzic: Authored Specification by Example and conducted industry surveys on adoption
- J.B. Rainsberger: Connected TDD to queuing theory and its productivity benefits
- Simon Brown: Created C4 model (2006-2011), author of Software Architecture for Developers, founder of Structurizr
Further Reading
Historical Foundations
- NATO 1968 Conference Report
- Go To Statement Considered Harmful - Dijkstra
- Fagan Inspection
- Cleanroom Software Engineering
- Personal Software Process - SEI
Design Communication
- Unified Modeling Language - Wikipedia
- UML Diagrams - Reference guide
- C4 Model - Simon Brown’s official site
- Software Architecture for Developers - Simon Brown
- Sequence Diagrams - Wikipedia
Extreme Programming
- What is Extreme Programming? - Ron Jeffries
- Extreme Programming - Wikipedia
- About Extreme Programming - XP Alliance
TDD and BDD
- Test-Driven Development - Martin Fowler
- Introducing BDD - Dan North
- What’s in a Story? - Dan North
- BDD is like TDD if… - Dan North
- How Test-Driven Development Works - J.B. Rainsberger
- SbE: 10 Years Later - Gojko Adzic
- TDD vs BDD - Refine
- Example Mapping