Why do we need a test data catalogue?

Test data catalogues are widely used – the bigger and more complex the application under test and the landscape, the more important a fine test data catalogue. When we test complex applications (WEB, API, SOAP, etc.) we usually need hundreds of steps with countless variations. An approach to handle the complexity is to divide the huge task into different test scenarios. Each of the scenarios usually holds again countless variations of test data.

As we increase the number of scenarios, we need a strategy how to keep the test data catalogue still maintainable, e.g. if one major dimension is updated, we don’t want to maintain tens or hundreds of files manually with this update.

Simplified real world example

Our task is to create End-2-End tests for a site like e.g. ebookers.com. When we look at the page from a test automation perspective we see a lot of work coming up. In our to-be test data catalogue we will need combinations of the following data dimensions:

  • Locations (used for origin an destination)
  • Classes (e.g. Economy, Economy Plus, Business, First class)
  • User-Account logins (e.g. Company, private, Faker-data for random new customer)
  • Car-Types (e.g. Group-Code)

Our TestDataGenerator will need at least some of these test variations

  • Arrangements (Flight only, Flight+Hotel, Flight+car, Car+Hotel, etc.)
  • Number of persons (1..10 adults, 1..7 kids)
  • Duration of the arrangement (random from 3 to e.g. 21 days)

Build a test data catalogue based on scenarios

In reality your TestDataGenerator would have plenty of columns and use many dimensions to create a list of possible data combinations. But for the sake of the example those 4 static and 3 variable inputs are enough.

Grouping to reduce complexity

Let’s assume that you group your test scenarios by arrangements. The input data and user journey for each of the arrangements makes this attribute a great grouping element. Following this thought you will want the test data for each of the arrangements in a separate test data catalogue.

Grouping brings new challenges

Except… you don’t really want to! For instance for all arrangements which include a car you want to have a central place to maintain car types. Same goes for all arrangements where flights are included. You will not want to maintain all possible flight classes in many different test data catalogues. If you still do it – e.g. because your test automation software doesn’t make it easy otherwise – you will waste time on changes. Actually not only for the changes. You will waste even more time to debug an error of a test case only to find out, that one of the many files misses the latest update.

Multiple data sources solve the challenge

As laid out above we need a solution to maintain different parts of the test data catalogue in different ways. Again, this is mostly relevant in larger applications and larger system landscapes. If you test a fairly simple SPA with one main data flow, breaking up your test data catalogue could have a negative effect on your productivity. “Keep it as simple as possible, but not simpler” must be the translation of one of Einstein’s statements. This is also true in test data management.

Mix and merge

Mix data from multiple sources

Sometimes when we have a split test data catalogue we simply want a random record from the source values. Sometimes we need to be very specific. In the above example we might not care so much about the flight classes, but we might have a scenario and user journey, that specifically tackles company users of the site. Thus we want “any” of the flight classes, but ONLY records from the user accounts, which are companies.

Merge data into test records

From these company account test master data records we might want to use a specific attribute in our test case and change it (e.g. number of employees) during the test case and assert whether the application behaves as expected. One source dimension may contain many attributes which are used to enrich data for a single test record. Examples include gender and age of business partner records. Those can be used to fill in form data of a web application.

Solution abstract

When we think about the pain of maintaining the same data qualities in different test data sets we think of normalization. In relational databases we would create separate tables for those records and “join” them into our test data. In the event of a change (or addition), we have one central place for the adjustment. Wherever this source data is used we would see the new records. That’s a great, proven concept.

Solution for baangt

It’s easy to make good decisions when there are no bad options

As you know with baangt test automation suite we focus on simplicity. Maintaining a separate database for some entities is not something the average colleague in a business department would do or have access to. But Microsoft Excel is almost always in reach. In baangt we enable business people to combine the best of the two worlds:

  • Central maintenance of common data records for a test data catalogue
  • Simple and versatile process using Microsoft Excel

How does it work in baangt?

A simple solution for small(er) scenarios

You might be familiar with the TestDataGenerator functionality. It’s a great tool to generate 10, 100, 1000 or millions of test records based on a simple syntax in an Excel sheet. Combining common data records of a test data catalogue with dynamic data for individual test cases follows the same principle and the same syntax:

Using RRD_ Syntax (Remote ReaD) we could for instance say:

  • RRD_(Partner, *, *) to get a random record from the sheet “Partner”. That might be good for some use cases.
  • RRD_(Partner, *, [Account-Type:Business]). This is more like it. We can get the desired corporate customers.
  • RRD_(Partner, [Account-Type, Account-Number, VAT-Number], [Account-Type:Business]): Now we get the same records but instead of moving all fields from the source sheet, we only take 3 attributes

For RRD_ to work, we need all data in the same sheet. This works great if you have a limited number of test scenarios, each one in one sheet of a larger Excel File. Typically besides the scenario sheets you will have 5 or 10 additional sheets with master data records. That way all test scenarios share common data. If you need to change, add, delete master data you do it only in one place. That’s simple and easy. This approach also reduces the chance of dealing with corrupt or outdated test data.

Larger scenarios work well too!

In addition in larger setups you’d work with separate files for scenarios and master data for maximum flexibility. Here RRE_ (Remote Read External) command is for you! It is as powerful as RRD (see above) by simply adding the file path and file name as the first parameter:

  • RRE_(Partner.xlsx,Partner,*,*):  gets a random record from a file “Partner.xlsx”, sheet “Partner”.

All the other examples work like described above.

Spend more time on test cases and less time on data management

We’ve seen at least two ways to build simple yet useful test data catalogues. Maintaining data change is inevitable. In this article we covered ways to considerably simplify data management. Do you have any feedback or ideas how to improve it further? Leave a comment!