activity icon indicating copy to clipboard operation
activity copied to clipboard

Deleting a user should anonymize their activities

Open sbernhard opened this issue 5 years ago • 15 comments

When deleting a user, all its associations needs to be anyonmized according to DSGVO / GDPR

Steps to reproduce

  1. A group folder is used by user A, B and C
  2. A creates a file F1 in this group folder
  3. B creates a file F2 in this group folder
  4. A changes the file F2
  5. Admin deletes user A

After step 3: The activity of F1 and F2 includes the user A After step 4: A is still in the activity log of F1 and F2

Expected behaviour

After the admin deletes user A, F1 and F2 should have a "anonymized" entry so that even the user name of A is not shown.

Actual behaviour

The deleted username does still exist in F1 and F2 activity.

Server configuration

The following setting is configured so that activities of users in group folders are tracked: activity_use_cached_mountpoints = true

sbernhard avatar Sep 16 '20 20:09 sbernhard

Is this the correct app in which this needs to be addressed @nickvergessen? Any hint / file / section / similar methods would be nice. If possible for me, I would implement this feature because its important for me.

sbernhard avatar Sep 24 '20 21:09 sbernhard

Yeah it might be correct here. The main issue is the activity app stores what has been given. but each app that creates/renders activities should basically take care of that action itself.

Since this is also a rather drastic impact and not required all the time, I'm not sure we can do that directly in the activity app though, but we can discuss it a bit.

nickvergessen avatar Sep 24 '20 21:09 nickvergessen

If the information (the data behind the activities) is stored within nextcloud core itself, and nextcloud core has access to that data, it would be probably simpler to run a task when deleting a host. AFAIK, there is no "link" between file and user.name but the user.name is written as plain data to the activity. Which would mean, the task need to find all occurrences of the user.name which could be pretty slow...

As I said, its about DSGVO / GDPR, so, not something which can be ignored when running a nextcloud in a company.

sbernhard avatar Sep 24 '20 21:09 sbernhard

As I said, its about DSGVO / GDPR, so, not something which can be ignored when running a nextcloud in a company.

Totally depends on your setup and contracts and your location of business, but well let's ignore the reasoning for now.

If the information (the data behind the activities) is stored within nextcloud core itself, and nextcloud core has access to that dat

Yeah well the problem is that user ids are also stored in the subject/message parameters, those can be json encoded arrays without descriptive keys, so we can not simply string replace them on the user deleted hook. That is why I meant that every app would basically have to take care of that procedure itself.

If you think you have to delete all traces of a user something like this will be what you have to do:

DELETE FROM oc_activity
WHERE user = 'USERID'
OR affecteduser = 'USERID'
OR subjectparams LIKE '%USERID%'
OR messageparams LIKE '%USERID%'
OR link LIKE '%USERID%'
OR file LIKE '%USERID%'
OR subjectparams LIKE '%EMAIL%'
OR messageparams LIKE '%EMAIL%'
OR link LIKE '%EMAIL%'
OR file LIKE '%EMAIL%'
OR subjectparams LIKE '%FIRSTNAME%'
OR messageparams LIKE '%FIRSTNAME%'
OR link LIKE '%FIRSTNAME%'
OR file LIKE '%FIRSTNAME%'
OR subjectparams LIKE '%LASTNAME%'
OR messageparams LIKE '%LASTNAME%'
OR link LIKE '%LASTNAME%'
OR file LIKE '%LASTNAME%'

Might delete too much and not enough at the same time. Or you just garantee a date until when you will have deleted all traces (e.g. 30 days) and just set the activity expiration to 30 days.

nickvergessen avatar Sep 24 '20 21:09 nickvergessen

Thanks for your explanation. Very appreciated.

Instead of deleting a entry, I would prefer to anonymize the entry if this is somehow possible (without changing all apps). This should be a basic functionality of nextcoud core or the activity app (afaik, the activity app is the source for "all" associations between files and users-names).

"activity expiration to 30 days" <- what exactly do you mean? Is there a setting in which you can specify to remove the activity of a user after a certain time?

sbernhard avatar Sep 25 '20 07:09 sbernhard

Yeah, see https://docs.nextcloud.com/server/19/admin_manual/configuration_server/activity_configuration.html#configuring-your-nextcloud-for-the-activity-app

nickvergessen avatar Sep 25 '20 07:09 nickvergessen

Thanks. Well, its also possible to add "comments" to a file in which a user name is also saved. Means, this would be another source (out of many others...) which needs to be anonymized.

sbernhard avatar Sep 25 '20 07:09 sbernhard

Are comments, and other sources in which a "name" / "email" other private data also stored in the activity?

I guess, the problem gets even worse, if you think about group ware features like contacts / calendar. But this is another topic. For me, the first and most important goal is to have nextcloud DSGVO / GDPR conform for the nextcloud core itself - which is about file storage.

sbernhard avatar Oct 02 '20 20:10 sbernhard

I would say the expiration of Activities should take care of the basic use case...

We discussed at some point that we would need to write a 'hook' that apps can listen to and respond by offering (for export) or deleting data of a specific user. @blizzz has looked into that. He's on sick leave but if @sbernhard is interested in working on that I guess it might be worth a conversation. From our side, I don't think we'll implement this unless there is customer demand.

jospoortvliet avatar Oct 03 '20 05:10 jospoortvliet

Ok. Fair enough @jospoortvliet / @nickvergessen but you should think about the the GDPR https://gdpr-info.eu/art-17-gdpr/ (Right to erasure (‘right to be forgotten’). I'm pretty sure, that companies, authorities, universities are not able to use nextcloud because nextcloud doesn't respect this right - or they would tell you sooner or later that this is a 'no go'.

Or, what do you tell the customers what they should do respect the GDPR's 'right to be forgotten' in nextcloud?

sbernhard avatar Oct 08 '20 06:10 sbernhard

Governments, university and schools do use Nextcloud because we DO respect this right. The default is 6 months and we will have forgotten the activities. If you don't see it like that, either reduce the duration, or simply disable the activity app until something was changed.

nickvergessen avatar Oct 08 '20 06:10 nickvergessen

Thank @nickvergessen. Now I understand the solution in nextcloud. :-)

I would expect to have a solution in which I always know who has added / changed a file till the point in time in which a user was deleted. In this case, I would expect to have a entry like "deleted user" or "anonymous user".

sbernhard avatar Oct 08 '20 07:10 sbernhard

Thank @nickvergessen. Now I understand the solution in nextcloud. :-)

I would expect to have a solution in which I always know who has added / changed a file till the point in time in which a user was deleted. In this case, I would expect to have a entry like "deleted user" or "anonymous user".

Well, if that's required, some SQL queries can make that happen, if not very easily, but that's what our customers pay us for - help with such specific use cases. That is why it isn't smart to run enterprise software without vendor support 🚀

jospoortvliet avatar Oct 08 '20 07:10 jospoortvliet

thanks a lot. Lets use the activity_expire_days option.

Using this option, will it "delete" comments to a file of that user, too if the the expire day was reached?

sbernhard avatar Oct 08 '20 22:10 sbernhard

About comments, when a user is deleted, the comments are left in place, but the acting user information is anonymized to "Deleted user" or something like this. IDs in mentions stay, but cannot be resolved to a name. The client decides how to present it, in the web we show "Unknown user". Mentions are just text in the comment that is being matched and replaced on demand, there is no index that tells us which user is mentioned were.

blizzz avatar Oct 12 '20 22:10 blizzz