Data Fabrication & Data Masking with Virtual Data Pipeline

Written by James Hartwright | Jul 24, 2019 5:14:51 AM

Vincent McBurney and I recently sat down to talk about Virtual Data Pipeline and what it can help with. This blog forms part of a five-part Q+A series. Today blog #2 covers data fabrication and data masking with VDP, following blog #1 in which we discussed what Virtual Data Pipeline is.

Virtual Data Pipeline is a tool that delivers secure, virtual copies of production and test data quickly, without costing you an arm and a leg in licensing and storage costs. The tool can be employed to rapidly increase the speed at which data for application testing can be provisioned without rapidly increasing the associated costs. I wanted to follow up on some questions that I’d received off the back of talking about VDP with customers.

How does data masking work with Virtual Data Pipeline?

It doesn't magically solve data masking problems. It doesn't automatically mask data so you still have a requirement to have a masking server such as Optim data masking, you still need to identify in your data what you consider to require masking, ie what is the personal identifying data or what is the commercial in confidence data, and you still need to set up the rules to mask that data on your masking server. VDP will then automatically apply that masking to the data that it provisions as an official copy.

So can this be done on a user group basis?

Within VDP there is a workflow tool which means that you can set up custom workflows for when you want data to be masked and who you want the data to be masked for. If there was a requirement for one party to see the actual data, a small group of users could see the real data versus other parties who have more limited views.

Can you create synthetic test data?

That's where IBM’s test data fabrication comes in. VDP is very good at copying data that exists in databases. You want to fabricate data for your testing and, in a warehouse scenario, often you want to create test data that triggers a rule or is out of bounds or is meant to cause a failed test. So, you would use the test data fabrication tool, like the IBM Optim suite, and a test data management tool to save off those datasets and pass them to VDP when VDP is creating a new test environment for you. VDP is very good at receiving production data and generating test data but you still need another part of the solution to create the test data cases that you need to put into VDP.

This best-in-class tool for development and testing practices is incredibly useful for provisioning the data you need to accelerate your application development process. However, as we’ve discussed, it doesn’t solve all the problems associated with data masking that we need to be especially careful of when talking about production data. We discuss the above questions at 11:30 in our VDP webinar which you can find at the link below.

Want to find out how VDP can work for your DevOps team? Get in touch with us at datawarehousing@certussolutions.com or find out more via the webinar we created discussing Virtual Data Pipeline and its use cases https://www.certussolutions.com/virtual-data-pipeline-webinar

View full post