data-anonymization
data-anonymization copied to clipboard
Impressive Speedup using activerecord-import
I have very large tables that I want to anonymize. A simple run of the anonymization code took me near to 40 minutes! So I tried to optimize the code a little and could get it down to 5 minutes by using the activerecord-import gem. I update my records on a postgresql 10 database using the Blacklist strategy. The trick is to not save every single record, but collect them and use the import-method of activerecord-import with its On-Duplicate-Key-Update-Strategy. Problem is, that it just works for mysql and postgresql that way. To test this just add the gem 'activerecord-import' use my fork and run the anonymization against a mysql or postgresql database.
Maybe I can make a pull request, but I have just tested my own case and don't know if something else is broken. Are you interested in such a feature?