Data Obfuscation Definition

Data Obfuscation consists of changing sensitive data or identifying a person (PII). With the aim of protecting confidential information in non-productive databases. A successful obfuscation is when the data maintains referential integrity. As well as its original characteristics, which guarantees that the development, testing, and installation of the applications are successful.

The other name for Data Obfuscation is Data Masking. There are also other terms that use together with data Obfuscation, such as data scrubbing, de-identification, depersonalization, data scrambling, etc. And the list goes on.

In Data Obfuscation, the format of the data remains the same. Only the value changes. The change of data includes different ways such as encryption, character mixing, or word substitution.

Uses

The use of this technique is in non-productive environments. Which in comparison to production have less security protection, so they are more prone to information leaks.

We can think that the greatest risk of compromising our information happens from outside to inside the company. And for that reason, we invest in perimeter security, without denying the need to protect our network.

We see in many cases that malicious people expose the information from inside with internal access who find unprotected data. It is appropriate to ask:

  • Who can see my business data in non-productive environments?
  • Are the people who test, develop, certify my applications are employees or outsourcing?

Considerations for Data Obfuscation

The following are some of the things you should keep in mind when designing or choosing a sensitive data Obfuscation solution :

Not reversible:

It should not be possible to recover the original sensitive data once the data masking process has occurred. If it is possible to reverse the process to recover sensitive data again, this does not fulfill the purpose for which it is performed in data masking.

The masked data should resemble the production data:

This is one of the key points that need to be considered. The data should resemble live data because otherwise, the tests may not be valid. Therefore, when we design or purchase a solution for masking sensitive data, this point is one of the first that you should consider.

Maintenance of referential integrity:

If the data field that is being manipulated is a primary key. Any foreign key that refers to this primary key must also do so with the masked data. Otherwise, referential integrity is not maintained, and there will be a foreign key in some table that does not correspond to any primary key.

Repeatable:

Data Obfuscation must be a repeatable process. Production data changes frequently, sometimes in a few hours. If the data masking solution supports masking only once, it may cause a problem, because new records added may not be masked.

Database Integrity:

In addition to maintaining referential integrity. The data Obfuscation solution should also be able to take into account triggers, keys, indexes, etc. You should be able to discover the relationships between the different objects in the database automatically and you should be able to maintain their status accordingly.

Masking of pre-packaged data:

While buying a data Obfuscation solution, you should also look for the help of having pre-packaged masked data for general requirements. Such as credit card numbers, social security numbers, etc. The solution in question should have examples of data prepared.

Keep this in mind that it is not an exhaustive list of features. These are just some features that we believe should consider. Most commercial solutions today have many more features, and you should do a thorough evaluation before choosing your data masking solution.