arx
arx copied to clipboard
Add data masking functionality
Hi, I am not aware for string or categorical data what generalizations or aggregate functions exist, but no matter if, for an attribute set as Identifiable I set Transformation: Generalization or Microaggregation, and for microaggregation no matter what aggregation function I choose, the result of these attributes remain "" and the Transformation gets reset to Generalization. I am also wondering if you have plans for salted hashing transformations which I believe are useful to be applied on identifiers although less anonymous than "" fields - but perhaps useful if creating a test database where we want to minimize private data but still be able to do verifications and correlations. Thanks
Hi Nicu,
thanks for your interest in ARX!
Currently, data in "identifying" fields can only be deleted, i.e. replaced with "*". We are working on integrating a new perspective into ARX for data masking, which will support various types of data masking for identifying fields, but this will need more time until it will be released.
If you are able to select "generalization" or "microaggregation" then you must be using "quasi-idenfiying" fields. This may mean that the fact that "microaggregation" gets reset to "generalization" may be a bug. If so: can you reproduce this with our current master branch? Can you provide a minimal working example?`
Best Fabian
Hi, What masking techniques do you have in mind? What I am thinking now are about what I would call a cardinality decreasing pseudonymization technique e.g. having 1000 unique input names, the output would be 10 unique output names (random names with no relation to input ones by any function) as an example, or an input id from a set of cardinality 1000 would be rewritten to a random id in a set with configured much lower cardinality e.g. 10 or 100.
On Tue, Jun 27, 2017 at 4:21 PM, Fabian Prasser [email protected] wrote:
Hi Nicu,
thanks for your interest in ARX!
Currently, data in "identifying" fields can only be deleted, i.e. replaced with "*". We are working on integrating a new perspective into ARX for data masking, which will support various types of data masking for identifying fields, but this will need more time until it will be released.
If you are able to select "generalization" or "microaggregation" then you must be using "quasi-idenfiying" fields. This may mean that the fact that "microaggregation" gets reset to "generalization" may be a bug. If so: can you reproduce this with our current master branch? Can you provide a minimal working example?`
Best Fabian
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arx-deidentifier/arx/issues/110#issuecomment-311356010, or mute the thread https://github.com/notifications/unsubscribe-auth/AHYGPhGL3QOmGfb7eh4X_V7UsiUSPw0wks5sIQHMgaJpZM4OGic1 .
Hi Nicu,
these types of transformation will definitely be supported. I have changed the title of this issue and I will leave it open for now. I will close it, as soon as we have released a version of ARX with the described enhancements implemented. However, as noted above, please be patient. This will take time.
Best regards Fabian