data-anonymization
data-anonymization copied to clipboard
Bulk table updates
Not an issue per se.. I've been adding a bulk table update method to a fork of your project and thought you might be interested. Relatively simplistic at the moment but the general gist is
bulk_table 'my_table' do
where "some_column != 'some value'"
anonymize('pii_column') { 'xxxxxxxx' }
end
Seeing as the where filter is passed straight through to AR, it can be a hash or could include a subquery filter. The anonymisation currently just passes a random string through to the strategy but it could be made a bit smarter, looking at column type etc - for my purpose I'm just using the Anonymous strategy with a block as per above.
Another thought might be to simplify things even further by passing the query itself through as a param. Something like:
bulk_table ... do
with_query do |query|
query.
joins('join other_table... ')
where(other_table: { value: 'bar' })
end
end
Not sure if there is a nice way to do cross connection copies, other than to dump and load. Seemed a bit crazy to do that in memory (and also didn't fit my use case), so for now only supports anonymising the source DB:
https://github.com/Studiosity/data-anonymization/commit/cdfcfecb8b2cc129fa1f6933e58b9260bb765820
I am working on porting this tool to Java/Kotlin for better performance. If you want to give it a try for early version you can find it here... https://github.com/dataanon/data-anon Sample project https://github.com/dataanon/dataanon-kotlin-sample