masquerade
masquerade copied to clipboard
Delete data and anonymize the remaining records
The idea is that we will delete all of the older customer data (for example, delete customers that have been created more than 30 days ago), so that the DB dump will be a lot smaller + reducing Masquerade execution time. The remaining data should be anonymized so we can use it anymwhere.
Example config:
customer_grid_flat:
provider:
delete: true
where: "`created_at` < now() - interval 30 day"
columns:
name:
formatter: name
email:
formatter: email
unique: true
nullColumnBeforeRun: true
dob:
formatter: dateTimeThisCentury
optional: true
billing_full:
....
Currently, Masquerade just executes the delete, then it moves on to the next table, leaving the remaining records in the table anonymized. Very logical, but it would be nice to have the possibility to delete AND anonymize.
What would be the best place to implement this feature?
Maybe @johnorourke might have an idea about this, since he built the delete part?
@peterjaap The original design for that was "you can either delete or anonymize, not both", but this is a good idea. We have several possible requirements:
- delete a selection of records
- anonymize a selection of records
- both (perhaps with different 'where' statements)
- none
So for maximum flexibility maybe we need to just allow different 'where' statements for anonymisation and deletion. However, delete: true
previously switched off anonymisation!
Perhaps this approach:
-
delete_where
to specify the records to be deleted -
anonymize_where
to specify the records to be anonymized -
where
would fill in both of those - which keeps backwards compatibility - The system would run the delete first (if
delete:true
), then the anonymize - exactly as it does now.
@IvanChepurnyi I can see your work on the DataProvider system, so it would be good to get your input on this. Should we avoid backward compatibility and go for a generic "actions" config, instead of using implcit actions? It's a balance between easy config with "sensible defaults", the learning curve for new users, and reducing unexpected behaviour.
@johnorourke i'm currently using delete_where
in our builds (https://github.com/elgentos/masquerade/compare/master...Tjitse-E:feature/partial-delete). The only problem there is that it is not backwards compatible, but this could be solved (if needed) by keeping where
.
Adding both delete_where
and anonymize_where
seems like a good idea.
@johnorourke I like your approach, and if where
is used for both delete and anonymize it won't break behavior as the anonymization step just will be 0 rows, as those were previously deleted.
There is probably an opportunity to hide this logic behind the TableConfigution
class as checks for provider/where
become quite complex. I will work on this issue next week.
Watching this, as I'm also interested in this feature. Until then, is it possible to run masquerade twice with two different configs?
I'm thinking I can run the anon, then export for a full anon backup. Then come back and run the delete on the same db, then export a thin backup
Only problem is I need two different config file setups for this correct? I guess I could run two different phar's each with their own config, but that doesn't seem very elegant.
@SAN1TAR1UM the --config
parameter (which gives it a directory of config yml files) can be used multiple times, so you can use the same phar but just add an extra set of configs for one of the runs.
I read all and I have a question, why just not having in yaml file:
customer_grid_flat:
provider:
delete: true
where: "`created_at` < now() - interval 30 day"
and this below:
customer_grid_flat:
columns:
name:
formatter: name
email:
formatter: email
unique: true
nullColumnBeforeRun: true
dob:
formatter: dateTimeThisCentury
optional: true
billing_full:
...
First block will clean, second one will anonymize?
First block will clean, second one will anonymize?
@mehdichaouch I think multiple configs for the same table are ignored - the latest one wins, due to using array_merge
here: https://github.com/elgentos/masquerade/blob/master/src/Elgentos/Masquerade/Helper/Config.php#L80