Dave Farley describes approaches to acceptance testing that allow teams to work quickly and effectively, build functional coverage tests and maintain those tests throughout change.
Acceptance Tests are run in a production like environment.
What’s so hard?
- tests break when the SUT changes (particularly UI)
- tests are complex to develop
- this is a problem of design: the tests are too coupled to the SUT
Who owns the tests ?
we want to have lots of these tests: if it is a significant system we will have 10s of 1000s of these tests
common pattern: story with acceptance criteria => min 1 acceptance test per acceptance criteria
=> build up coverage quite quickly
=> you will over-test (no problem)
Developers own the Acceptance Tests. They own the responsibility to keep them working. Because developers will break the tests as they add code. Therefore it is their responsibility to fix the tests.
don’t have a separate QA team write the tests, as developers add code, the tests break, and the QA team will not be able to catch up
=> ending in tests that are broken all the time
Properties of Good Acceptance Tests
- focus on the “What” we want to assert not “How”
- Isolated from other tests: you want to run lots of them, you want to run them in parallel
- Repeatable: you want to run them over and over again
- Uses the language of the problem domain => ubiquitous language
- Tests ANY change
- Efficient: we want to run 1000s, 10s of 1000s of these tests
What not How
How = typical record and play back style of UI testing
=> breaks whenever the SUT changes
to avoid this, introduce a “Driver” (a level of indirection, also called “test infrastructure”) that talks to the system using the language of the problem domain: place order, create user, …
=> tests are written against the Driver
=> the Driver talks the protocol of the system
=> if code changes, we only have to change the Driver
Test Case Test Case Test Case
| | |
--------------|---------------
|
Test Infrastructure (Driver)
|
Public Interface
|
System under Test
Separate Deployment from Testing
-
traditionally it is fairly common to think that every test should control its start conditions and so should start and init the app
=> anti pattern, we don’t want that because that is inefficient
=> we want to separate the deployment from the testing, so we need to isolate differently
-
acceptance test deployment should be a rehearsal for production release
=> this is absolutely true: by the time we arrive in production, we want to have rehearsed the deployment over and over again
-
this separation of concerns provides an opportunity for optimisation:
we deploy it once, and then start running lots of tests against it, so we incur the deployment cost only once
=> parallel tests are run in a shared environment
=> lowers the test start up overhead
Test Isolation
-
any form of testing is about evaluating something in controlled circumstances
-
isolation works on multiple levels:
- isolating the SUT
- isolating test cases from each other => allows for running in parallel
- isolating test cases from themselves (temporal isolation) => allows for repeatable results when running a test case multiple times
-
isolation is a vital part of your Test Strategy
Isolating the SUT
typical picture in many enterprise systems:
System A -> System B -> System C
you are working on System B
System B is in a chain of dependencies
it is fairly common in large enterprises to have an integration test environment where you glue all these things together.
it is a requirement that you test it in these circumstances.
this is problematic: if your mode of testing System B is to put input into System A, you haven’t really control over the input that goes into System B unless you really know what is going on in System A.
=> you don’t have any idea of what kind of inputs you have to put in System A
=> this assumes that you have global knowledge of all the systems and an in depth understanding: this is not going to happen
this is not in a predictable, deterministic state: you cannot really know the state of the system, know if the test will pass or fail => anti pattern
what we really want is:
Test Cases -> SUT B -> Verifiable Output
we want control of that system, we want to interact with that system in realistic ways
interacting through the real communication channels of that system: the API, the UI
we don’t want special access for the tests, but we do want closed access so we can exercise the system as it would be really exercised
Validating the Interfaces
The trouble is: the reason why people do these crazy things (like integration tests) is that they worry about changes in the interfaces between the systems
that is a valid concern
what you really want is something like this:
Test Cases -> SUT B -> Verifiable Output
Test Cases -> External System A -> Verifiable Output
Test Cases -> External System C -> Verifiable Output
you want to test their interfaces too, but the amount of tests to assert their interfaces is much, much smaller (instead of extending your test scope using integration tests)
Isolating Test Cases
-
assuming multi-user systems …
if that is not the case, you can just spin off multiple instances of your application to obtain test isolation
- tests should be efficient - we want to run LOTS !
- what we really want is to deploy once, and run LOTS of tests
- so we must avoid ANY dependencies between tests … in terms of shared state, persistent state
what we are looking for is a way to isolate these tests that will allow is to run many of these tests in parallel, against the same instance of an application
=> a great way of doing that is using natural functional isolation, e.g. identify the boundaries in your problem domain that make sense and carve out your tests along these lines
if testing Amazon, create a new account and a new book/product for every test case
if testing eBay, create a new account and a new auction for every test case
if testing GitHub, create a new account and a new repository for every test case
=> you can set up these accounts, repositories, auctions, … in the exactly the right state for your specific test case, use it inside the test case, and then get rid of it
=> this has a weird side effect: you tend to make the creation of accounts, … very efficient
Temporal Isolation
- we want repeatable results
-
if I run my test case twice it should work both times
you want predictable results
Example:
create a book “Continuous Delivery” in the database
the next time you run the test, the book already exists and it will fail
=> Alias your functional isolation entities
in your test case create account “Dave”, in reality, in the test infrastructure, ask the application to create account “Dave123” and alias it to “Dave” in your test infrastructure
Repeatable
Test Doubles
imagine your system has to communicate with the outside world
System
|
Local Interface to External System
Communications to External System
|
External System
=> you have the opportunity to plugin another implementation for the communication
=> TestStub simulating the External System
=> using configuration we can switch between the real communication (for production) and the stub (for the test environment)
Test Case Test Case Test Case
| | |
--------------|---------------
|
Test Infrastructure (Driver) --|
| |
Public Interface |
| |
System under Test |
| |
Local Interface |
to External System |
| |
TestStub -------------|
TestStub is part of the Test Infrastructure, there is a back channel communication
so I can collect results from the test stub and submit input via the test infrastructure
so I can express what I am expecting from external systems
Uses the language of the problem domain
=> DSL
A simple DSL solves many of our problems:
- ease of TestCase creation
- readability
- ease of maintenance
- separation of “What” from “How”
- test isolation
- the chance to abstract complex set-up and scenario’s
- …
example: see the slides
The DSL is something you build over time, it is an evolutionary approach:
see how in the examples they’ve evolved from using specifically the tradingUI and the fixAPI to using a Channel annotation
Tests ANY changes
Testing with Time
- TestCases should be deterministic
- but Time is a problem for determinism
=> 2 options:
- ignore time: you don’t validate the time values returned by the system
- control time: or you can deal with it
Ignore Time
Mechanism
- filter out time-based values in your test infrastructure so that they are ignored
Pros:
- simple!
Cons:
- can miss errors
- prevents any hope of testing complex time-based scenarios
Control Time
Mechanism
-
treat time as if it was an external dependency, like any external system - and Fake it!
(treat the supply of time information as an external system, and stub it, control it)
Pros:
- very flexible
-
can simulate any time-base scenario, with time under the control of the test case
(you can test day light saving scenario’s, clearing examples where you clear 3 days later so you need to fast forward 3 days, …)
Cons:
- slightly more complex test infrastructure
Again you have in your Test Infrastructure a back channel that allows you to control time.
Test Environment Types
-
some tests need special treatment
we spoke a lot about sharing environments allowing you to run lots of tests in parallel
for some kind of tests this is hard to do (example: time travelling, destructive tests, …)
-
tag tests with properties and allocate them dynamically
your Test Infrastructure can use these tags and determine where to run these test cases (in which environment)
for time travelling: each test case spins off a new instance of the application
for destructive tests: you want to kill some bits of the system to see if it is robust enough, …
Efficient
we want to run 1000 and 10 of 1000 of these tests
Production-like Test Environment
you need to represent the stuff that are important in production
Make TestCases internally synchronous
make sure each test is run efficiently
ideal cycle time:
imagine something really bad is happening in production, you are loosing money, I still want to go through my pipeline to fix that problem and be confident of my fix
=> you don’t want that the tests takes days to run
=> rule of thumb: order of 40 min, maybe 1 hour
One problem are asynchronous systems
- look for a “Concluding Event” to listen for that in your DSL to report an async call as complete
-
if you really have to implement a “poll-and-timeout” mechanism, do it in your test infrastructure, not in your test case
looking for the concluding event is a much stronger strategy
- Never, Never, Never, put a “wait(xx)” and expect your tests to be (a) Reliable or (b) Efficient
Scaling-Up
if you have done all the previous, this becomes much more easy
over the coarse of your project the duration of the test run will go over the 40 min
=> buy hardware, VMs, go to Amazon, …
Anti-Patterns in Acceptance Testing
-
Don’t use UI Record-and-playback Systems
they are too brittle, too coupled to the system
-
Don’t Record-and-playback production data
this has a role, but it is NOT Acceptance Testing
you are not testing exception cases, because these are going to nail you
-
Don’t dump production data to your test systems, instead define the absolute minimum data that you need
because you want to be scalable, you want to be flexible, …
-
Don’ assume Nasty Automated Testing Products(tm) will do what you need.
be very sceptical about them. start with YOUR strategy and evaluate tools against that.
-
Don’t have separate QA team. Quality is down to everyone.
Developers own Acceptance Testing
QA people are valuable, but don’t have them write acceptance tests separate from developers
-
Don’t let every Test start and init the app.
optimise for Cycle-Time, be efficient in your use of test environments.
-
Don’t include Systems outside of your control in your Acceptance Test Scope
-
Don’t but wait() instructions in your tests hoping it will solve intermittency
Tricks for Success
-
Do ensure that developers own the tests
-
Do focus your tests on What not How
-
Do think of your tests as “Executable Specifications”
-
Do make Acceptance Testing part of your “Definition of Done”
-
Do keep Tests isolated from one-another
-
Do keep your Tests repeatable
-
Do use the Language of the problem domain - Do try the DSL approach
-
Do stub External Systems
-
Do test in “Production-Like” environments
-
Do make instructions appear Synchronous at the level of the Test Case
-
Do test for ANY change
you want to have a very good confidence the system is ok
we can never write enough tests to proof that the system is good
but we can write enough tests that one fails it tells us it is not good
-
Do keep your Tests efficient
easy to write, easy to maintain
Questions
open source product: LMAX Simple DSL, it does the aliasing
-
Cucumber ?
not anti, it is just that if the developers own the tests, they use the tools they are familiar with
-
Write the test first ? Yes
once the test pass, you are done
-
About parallelising tests and not starting and init-ing the app ?
it is surely not always the right thing to do for all systems
if your system starts in under a second, you can surely start it for every TestCase
-
Resetting the database ?
only after each deployment of the system, not in between TestCases
I want to start from a known state and run all schema migrations: start from a known data set