wordpress-webmention icon indicating copy to clipboard operation
wordpress-webmention copied to clipboard

GDPR: Prevent saving data from crawled web mentions

Open yatil opened this issue 7 years ago • 30 comments

When services like Brid.gy get webmentions from silos, the users there have no idea that their comments or likes, along with their name and profile picture, are shown on the article site. This is problematic for privacy reasons.

In the spirit of GDPR and avoiding collecting and storing data in the first place, we should anonymize the data before it is stored in the database. Once the webmention is verified and comes from bridgy, it should result in a comment like “Someone liked this on twitter.com” with no identifiable information of the person who liked it.

yatil avatar May 06 '18 09:05 yatil

Sure, but we shouldn't limit it to bird.gy. A general way to handle this, would be to check if the referrer is a different domain as the source URL. If so, it should be anonymized.

pfefferle avatar May 06 '18 09:05 pfefferle

I haven't implemented something like that, because I am not sure if it is better to disable brid.gy completely.

pfefferle avatar May 06 '18 09:05 pfefferle

I agree for not limiting it to brid.gy – I would totally like to keep the functionality of knowing that a blog post got traction without knowing exactly who it is, but it might just be overhead. Another idea would be to add something to the Webmention spec that allows the mention to specify the level of data that you are allowed to save about the Webmention. I tried (badly) to summarize my thoughts here: https://github.com/w3c/webmention/issues/96

yatil avatar May 06 '18 09:05 yatil

Sure, but the problem is, that the spec says nearly nothing about the parsing/handling of the Webmention at all. The whole microformats part is discussed under the umbrella of the IndieWeb community.

pfefferle avatar May 06 '18 09:05 pfefferle

That's why I splitted the functionality into two plugins. If you are using only the Webmention plugin (which implements simply the webmention spec), you get exactly what you expect: "This Article was mentioned on twitter.com". The Semantic-Linkbacks plugin tries to make it human-readable following the IndieWeb principles.

pfefferle avatar May 06 '18 09:05 pfefferle

Ah, that makes sense!

yatil avatar May 06 '18 09:05 yatil

Perhaps we should also rethink bird.gy. Why should I, as a site owner, register at bird.gy to get likes to my tweets. Why not build a service where twitter users can register to send pings to sites, they like or tweet? Something like flattr tried some years ago https://blog.flattr.com/2013/04/twitter-is-forcing-us-to-drop-users-ability-to-flattr-creators-by-favoriting-their-tweets/

pfefferle avatar May 06 '18 10:05 pfefferle

@yatil but if you deactivate the Semantic Linkbacks plugin the "classic Webmentions" are also looking like that: "This Article was mention on yatil.de"

pfefferle avatar May 06 '18 10:05 pfefferle

Additional informations: https://sebastiangreger.net/2018/05/indieweb-privacy-challenge-webmentions-backfeeds-gdpr/

pfefferle avatar May 06 '18 10:05 pfefferle

I will assist in this, but I would not use it.

dshanske avatar May 06 '18 11:05 dshanske

Why should I, as a site owner, register at bird.gy to get likes to my tweets. Why not build a service where twitter users can register to send pings to sites, they like or tweet?

bridgy does this too! it backfeeds responses to your tweets, but it also tries to send outgoing webmentions to every link you put in your tweets.

A general way to handle this, would be to check if the referrer is a different domain as the source URL. If so, it should be anonymized.

do we have other common examples of this? I'd be reluctant to generalize just yet if bridgy is the only common one so far.

snarfed avatar May 06 '18 17:05 snarfed

we have manual Webmentions using the comment-forms or the endpoint forms...

pfefferle avatar May 06 '18 18:05 pfefferle

bridgy does this too! it backfeeds responses to your tweets, but it also tries to send outgoing webmentions to every link you put in your tweets.

wasn't aware of that, thanks for the info!

pfefferle avatar May 06 '18 18:05 pfefferle

honestly, i think the indieweb community has been way overthinking GDPR. we've been wringing our hands about it a ton, but i really don't expect it will affect us very much at all. in practice, almost all of what we're doing here is tiny personal web sites, which realistically will never get sued or anything similar.

at the risk of stereotyping: engineers and technical types like us like to think about laws because they're big fine grained complicated sets of rules, which we're comfortable with and attracted to...but that's not usually a good use of our time. i think the vast majority of our time spent on GDPR would be better spent on UX etc instead.

for bridgy specifically, here's our current thinking about its compliance: https://brid.gy/about#gdpr

(also, i'm just talking about laws specifically here, not ethics. ethics is worth thinking about! eg sgreger's great post, and the discussion in its comments. legal compliance though, meh. not a high priority for us in practice.)

snarfed avatar May 06 '18 22:05 snarfed

I lean with @snarfed on this. It's why all of my suggestions involve UI and documentation. You don't want to use X... turn it off.

dshanske avatar May 07 '18 00:05 dshanske

As a tech freelancer, none of my websites is considered “tiny personal” by the GDPR law. There is already an industry in Germany that sends Abmahnungen for the smallest of violations.

yatil avatar May 07 '18 07:05 yatil

@yatil That is why it becomes a settings issue. I am fine with adding settings to allow you to adjust the granularity of the response. But for those who take a different view for whatever reason, they should have the option as well.

dshanske avatar May 07 '18 11:05 dshanske

Sure +1 for a setting :-)

yatil avatar May 07 '18 11:05 yatil

@yatil I've stopped my other plans to help build these settings for all Indieweb plugins because I know it is a community concern to at least do some of this before May 25th.

dshanske avatar May 07 '18 11:05 dshanske

I'm with @dshanske, it should be a setting, not a mandatory thing forced on everyone.

armingrewe avatar May 07 '18 12:05 armingrewe

I did not want to suggest it would be mandatory for everyone, sorry if it came across like that.

yatil avatar May 07 '18 12:05 yatil

I didn't think you did, but wanted to define it

dshanske avatar May 07 '18 12:05 dshanske

fwiw, this still seems to potentially apply to all webmentions, not just bridgy or other proxies.

for example, if a site links to you, but it doesn't send a wm, someone else can still send a wm, and that site's author name, picture, will end up in your responses. if they don't want that, due to GDPR or whatever, the concern here still applies.

so if we want an "anonymize" option, ok, but we probably want to make it global.

snarfed avatar Jun 01 '18 22:06 snarfed

Why bother at all? If someone posts a like, reply or retweet publicly, he/she cannot prevent from this being sent/copied etc. all over the net. This should already be in the privacy statement of the original service. Just like Wordpress does for it's native comments. I wouldn't use a fully anonymized plugin. It defeats the purpose of webmention for me, social interaction. An option to enable it would be fine however.

metbril avatar Aug 20 '18 05:08 metbril

@metbril Because GDPR does not allow it. If you save the data, then you're responsible for it, no matter if the user posted it publicly or not. It is personal information you keep and the person saving the data is responsible for informing the user that it is collected (which does not happen with 99% of the websites, so one cannot expect the user to know) and give them options to remove the data from your website. Some might argue that this needs explicit consent before saving any data (I don’t think so). OR you can avoid trouble by anonymizing the data immediately.

yatil avatar Aug 20 '18 05:08 yatil

@yatil

the person saving the data is responsible for informing the user that it is collected (which does not happen with 99% of the websites, so one cannot expect the user to know)

Would this imply that for example Google is in violation of GDPR by indexing the Twitter website and storing personal identifiable information on the way?

metbril avatar Aug 20 '18 07:08 metbril

not violation, but they have to be transparent what they save and have to give you the option to request the information they have about you and the option to delete the informations.

The other aspect of Google is, that they only "cite" informations... We build a context and send texts to other pages, that can be commented in an other context.

pfefferle avatar Aug 20 '18 07:08 pfefferle

It is not about good/bad or correct/wrong, it's more about transparency, so that the user has a choice.

pfefferle avatar Aug 20 '18 07:08 pfefferle

@metbril No, as when you delete a tweet and google reindexes the page, the search result will also be deleted. Same when you put your profile in private mode. There’s nothing that would reflect that change via webmentions. (If Google would continue to show the entry, they would be in violation.)

yatil avatar Aug 20 '18 10:08 yatil

@yatil kind of... If you delete a post, the Webmention plugins also sends a delete request... But in the end it depends on the other party to support the deletion...

pfefferle avatar Aug 20 '18 11:08 pfefferle