Test Data Management - One of the Biggest Challenges in Testing


Test Data Management is one of the biggest challenges in Testing. To build a sustainable solution,  the right strategy is important.

The Challenges of Test Data Management

Test Data Management is one of the biggest challenges we have in Testing. In most organizations the focus on Test Data Management is on compliance and not on the added value for the testing process. But compliance instigates only limited real benefit. If we take a view to the software development lifecycle and testing process we will see that software quality and testing is depending on what test data is available for testing activities. The right Test Data approach is also necessary for a continuous delivery pipeline and automation. It’s the driver for continuous quality in all companies.

What is Test Data Management?

Test Data Management is not just the way how to provide test data for testing, it’s also the discipline how to organize the provisioning of data with tools - how to use data efficiently in the test process and how to be compliant with regulations like GDPR. Test Data Management is the process to Scope & Plan data, Design & Build, Manage & Control and Decommission.

The ACT Framework - Test Data Management - (expleo)


Challenges in Test Data Management

Test Data Management starts with data definition in  the Test Case Design. This gives the fundamental basis for a high availability and automated process. In detail it is important to think about the test stage and what data integrity I need, if I have a consumption of data, the number of recreation and as a result of this the method for provision. When will I provide data synthetically and when do I have to anonymize? Sometimes it’s not really possible to provide synthetic data at the level of integrity I potentially need.

The demand for data consistency has an impact on the scope of Test Data Management. For data provisioning it’s essential to know:

    • how environments are designed
    • which applications are integrated
    • what interfaces do I have
    • which databases and filesystems and basically
    • whether mock-ups or virtualized services are needed or used.

ACT - Blog - TDM - expleo 2

Test Data Provision is one of the core requirements for enabling agility with focus on fast delivery and high quality. This also means automated provisioning approaches are no more just a dream -they now become reality. We will need initial data sets including a deployment to the environment  to start automated tests and support shift left mechanisms.

This also means that we need a mindset change when it comes to test case description: The test case must be the trigger for the test data request and not the other way round - looking for the data and describe test case around it.


How Could Test Data Be Provided Automatically?

When we get a test data request, the first step is to validate if it should be created synthetically or anonymized.

For synthetic provision we can use GUI-Automation (if the GUI is still available) or webservice calls with the right structures. For those approaches it is possible to include scripts into the automated deployment process to ensure to have the right data after each deployment for the first  test of a build. Additionally it is possible to use executables in the deployment process like autoloader in Tosca TDM to provide this data in the database directly our execute webservice requests.

These executables can be included in automated testcases. By the way - this approach can also be used for loading data copies following a golden source and archiving approach. This means  data backups are stored and archived in a regular way. That makes a fast setup of environments possible - also in complex and deeply integrated environments. The data copies can be used as repository and can be built up during a sprint - a data supply is ensured all the time.

As we are moving towards user acceptance testing, the integrity of data will increase and the demand of production-like data will rise. To be compliant it will be necessary to anonymize data from production systems meeting GDPR requirements. For this reason a tool will be indispensable. Normally such anonymizations are done in defined periods because creation time mostly is too long to include it into a continuous delivery pipeline.


How Can Test Data Management Be Organized?

One of the question companies deal with is: how should I organize Test Data Management?

This is not an easy question because it depends on the demand on test data. Normally most of our clients organize Test Data Management in a separate Service Organization. This Service Organization can be organized centrally or decentrally - this depends on how much know-how exists in the teams and if the teams are enabled to generate data via interfaces. Usually for single components it is more efficient to organize data management in the agile teams with their own possibilities of tool usage etc.

This has the disadvantage that know how for data management is isolated in that special team and they don’t have a comprehensive view on data management. There is also a big challenge  when teams are using different approaches because they will have inconsistency in their data as a result of different anonymization approaches.

Only baselines should be committed for data provision between the teams if they use the same environment. The higher we get with Integration it makes more sense to organize Test Data Management in a central Service Organization because you do not only need the technical knowledge about all systems but you also need comprehensive business know how for provisioning of the right data.

The Service Organization is acting as an expert organization and gets the complete ownership for all test data approaches. They also decide together with enterprise architecture the tooling in the whole landscape for test data provision and deliver the tools as a service to the test organization. This makes the complexity in the landscape and data structure manageable. Because of the number of data is rising in the past years this service is getting more and more important. This also has the benefit, that technical and business inconsistencies can be communicated fast and direct  to the test team (potential showstopper).

From a process perspective this Service Organization can receive a  service request from all requestors and will prepare the environment and data according to the request. We also recommend to set up service catalogue for test data with a definition of standard data and non-standard data.

Standard data can be provided very easily and automatically. The creation of non-standard data comes with a higher effort - therefore this kind of data will not be requested in such a high number so that this will be defined as standard. The requestor then can choose between this both types and can start his request. The set of standard data is built after number of requests for special data and normally needs an alignment with business. In order to that, the catalogue can be updated regularly and you also can add services on it like Data Search with defined scripts or reservation of Data.

In some of our client projects we as (Expleo) offer a portal for data requests and build up pipelines with tooling included in order to increase automation. This also is a driver for our clients to decide for a sustainable test data solution along their data strategies.

Most of the clients don’t see Test Data Management as a driver for efficiency - they just see it as a result of GDPR or other regulations. But test data is the baseline for continuous delivery and a big driver of efficiency within the testing process. No other process takes longer like getting the right data for the test. Additionally: with a good Test data management we can reduce storage for logs, databases or run times for batches in tests.

In times of fast delivery,  Test Data Management should be a core topic in test conception and therefore we will need defined processes for it. Having this, it does not only bring transparency in test data provisioning - furthermore it makes test data management evaluable.  For example:  when I know that I served 2.500.000 test data request per year for a number of 40.000 functional test cases, then there is a mismatch between test data provisioning  and test cases.

Test case based data is required for an efficient solution in Test Data Management. Prior to the execution of a test, different types of suitable test data for a test case are entered in a test environment as the basic prerequisite.

This includes:

  • Corporate wide Central Data such as zip codes, valid addresses etc.
  • Configuration Data such as test specific branch structures, addresses of externally addressable third party test environments, key names for data transfer etc. (separation from production)
  • Master data such as customers, accounts, products etc.
  • Transaction Data or Update Master Data such as ongoing orders, open invoices, upcoming reminders, etc. This data especially holds temporal dependencies and the scheduling of the test execution needs to be considered.

To see how The ACT Framework can boost your team’s testing success at all levels, download the whitepaper.

About Author

Helmut Körfer

With over 9 years experience in Test Data Management and Test Environment Solutions - not only built in Test Factories - I have provided testing services to several clients in banking, insurance, automotive, retail and logistics. I have worked in different roles within the testing process from Project Management to Test Data Management and as an Automation Expert. Today I am Senior Manager for Strategic Sales and Engagements at Expleo.

Subscribe to Newsletter

Popular Posts