There’s a familiar moment in many software delivery teams. The sprint is planned, automation scripts are ready and testing is about to begin. Then someone asks the inevitable question: “Where are we getting the test data from?”
In many organisations, the response is still the same: copy production.
On the surface, it makes sense. Production data already reflects real-world behaviour and contains the complexity teams need for testing.
The problem is that while copying production data feels convenient in the short term, it creates operational, security and scalability issues that become harder to manage as delivery speeds increase.
Creating a production copy for testing is rarely as simple as it sounds.
First, organisations need infrastructure capable of hosting large production-scale datasets, which in enterprise environments can reach multiple terabytes. From there, teams usually subset the data into something smaller and more practical for testing. In many cases, this relies on legacy scripts or manual processes that have evolved over time without consistent ownership.
Next comes masking and anonymisation, followed by validation to ensure the data still behaves correctly after being altered. Only then can testing begin.
By then, considerable time may have passed, and the environment immediately starts falling out of sync with production as new transactions, schema updates or configuration changes occur. The refresh cycle then repeats itself all over again.
There is also a significant infrastructure impact. Maintaining multiple non-production environments based on production copies can consume storage volumes comparable to — or even larger than — the live production estate itself.
For organisations operating with rapid release cycles or continuous delivery pipelines, this model increasingly struggles to keep pace.
The operational overhead is only one side of the problem.
Production systems often contain highly sensitive information, including customer records, employee data, financial details and regulated personal information. Once that data is replicated into testing environments, it becomes accessible to a much wider audience than intended.
Developers, third parties, contractors and automated tooling may all gain access to environments that typically have weaker controls than production systems.
Masking processes can reduce risk, but they are often inconsistent. New fields appear in production schemas without updated masking rules. Sensitive attributes can be overlooked. Data that should never leave production environments sometimes ends up widely distributed across testing landscapes.
These issues often remain hidden until an audit or security incident exposes them.
Regulatory scrutiny has also intensified in recent years. GDPR enforcement activity continues to expand across sectors including healthcare, financial services, retail and energy, increasing the pressure on organisations to demonstrate stronger governance around non-production data usage.
As a result, treating production data copies as a low-risk testing shortcut is becoming far harder to justify.
The regulatory landscape is also evolving beyond data privacy alone. Frameworks such as the European Union NIS2 Directive are increasing expectations around cyber resilience, access management, operational security and supply chain oversight across critical and digitally dependent organisations.
Under NIS2, organisations are expected to demonstrate stronger control over how systems and sensitive data are accessed, managed and protected throughout the software lifecycle.
That places additional scrutiny on non-production environments containing copied production data, particularly where governance, monitoring and access controls are inconsistent.
Even when copied environments are built correctly, they still age quickly.
Production snapshots represent a single point in time. As systems evolve, the copied environment gradually becomes less representative of live conditions. New customer behaviours, pricing changes, product configurations and recent transactions are missing from the test landscape.
That creates a growing disconnect between testing conditions and production reality.
Tests may pass against outdated datasets while failing in live environments where configurations and transaction patterns have changed.
There is also a practical usability issue. Full production copies contain huge volumes of irrelevant data that offer little value for specific testing objectives. Teams often spend unnecessary time searching for usable records or manually shaping data for particular scenarios.
Instead of accelerating delivery, test data management becomes another bottleneck in the release process.
The limitations of production-copy testing stand out even more in modern distributed environments.
Today’s enterprise ecosystems often span multiple interconnected platforms, including ERP systems, CRM tools, cloud services, payment engines, APIs and third-party integrations. A single workflow may depend on several systems operating together consistently.
When each environment is copied and refreshed independently, alignment issues quickly emerge. Relationships between systems drift apart. IDs no longer reconcile correctly. Time-sensitive data falls out of sequence. Integration testing failures occur for environmental reasons rather than genuine application defects.
CI/CD and continuous testing have also changed expectations around delivery speed, making days-long environment refresh cycles increasingly impractical.
As a result, many organisations are beginning to treat test data as an automated component of software delivery rather than a manually maintained by-product of production systems.
Teams moving away from production cloning are not eliminating test data management — they are redesigning it.
Rather than copying entire environments and trimming them down afterwards, modern approaches focus on generating only the data required for testing. The goal is to create production-realistic datasets with accurate relationships, built-in privacy controls and rapid provisioning times.
This allows test environments to integrate directly into automated delivery pipelines, making refreshes faster, repeatable and less dependent on manual intervention.
It also enables organisations to create targeted datasets for specific edge cases, compliance scenarios or rare workflows that may not exist naturally within production snapshots.
From a governance perspective, the model changes significantly as well. Privacy protection becomes embedded into the way test data is generated, rather than being treated as a separate masking step applied after copying live records.
That reduces the risk of exposing real personal information in non-production environments while making test data easier to manage at scale.
At Synthesized , we partner with Resillion to help organisations modernise how test data is managed across the software delivery lifecycle. By combining synthetic data generation, automated masking, subsetting and provisioning, organisations can accelerate testing and strengthen compliance with GDPR and NIS2 without relying on large-scale production copies.
I’ll be presenting on a Resillion webinar on 30 June 2026 with Conor Thomson, where we’ll explore how test data automation can help organisations reduce production data risk, preserve realism and strengthen assurance for AI and NIS2.