Figuring out how to manage test data can certainly be challenging. In order for an automated test to run, it needs to have some sort of data. What kind of data it needs will largely depend on the type of application being tested.
For an ecommerce site, an automated test might need a username and password, an item name to add to my cart, and a credit card number to use during checkout. However, these data items can change and fluctuate over time. Perhaps the database is refreshed, wiping out the user’s account. Perhaps the item being purchased is no longer available. Perhaps the credit card has expired.
Whatever the reason, the data used during a test can and will change over time and we need some way of dealing with it. There are four primary ways of dealing with fluctuating test data:
1 - Past State
Since tests need to use a specific data set, the first option is to simply set the application to a previous state instead of having the tests worry about data expiring. We can do this by saving the production database at a specific point in time and then refreshing our testing environment’s database back to this image before each test run. This ensures the data the test expects is always the same. This task can be performed by the IT department, developers, or QA engineers. It can also be automated with a CI server like Jenkins to happen automatically.
2 - Get State
The second option is to fetch the application’s current state, and pass that data into our test automation. We can do this by reading from the database, scraping information off the GUI, or calling a service. For example, before the automated tests attempt to add an item to a cart it makes an HTTP request to a REST service, getting the list of current and active items. We now have a set of items we know should be valid. Fetching test data can be done before each test, if the automation tool supports it. Alternately, a daily job can be scheduled to store the current data to a file and the file can be parsed by the automated suite.
3 - Set State
A second set of automated scripts exist simply to create the data needed through the application’s GUI. These may run against a back end application different than the end user application. For example, many web applications have an administrative GUI that can be used to create/delete items in the system. We can automate this application to create the users and items needed for the automated tests. As it is a separate set of scripts, these only need to be run periodically - after a database refresh, when a new environment is spun up, etc.
4 - Zero Sum
In this approach we create and delete our data as part of the test. If the scripts require a username and password, instead of assuming that the user is already created, they create the user as the first step and delete the user as the last step. This, of course, assumes that full CRUD (Create, Read, Update, Delete) functionality is available in the application. This may seem like additional work, but this functionality needs to be tested anyway. For example, we create four tests, each run sequentially: the first creates a new item in the system; the second test reads that item and verifies the data is correct; the third updates the item and verifies the update was successful; and the fourth deletes the item and confirms the item is gone.
When at all possible, a Zero Sum approach is the best as it is self-contained, very efficient, and almost forces us to have good test coverage. However, most of the time a hybrid approach is needed, using more than one in combination. For example, we might have a ‘Set State’ style script that creates the cross-test data, such as users. Then each test would try to be ‘Zero Sum’ with respect to its own data. Using the right combination of approaches can drastically reduce test execution time will also increasing coverage and efficiency.