matomo icon indicating copy to clipboard operation
matomo copied to clipboard

Add an option to disable fingerprinting / config_id entirely

Open MoritzLost opened this issue 3 years ago • 46 comments

Update - the specs

  • Fix in 5.0.0 - See the proposed solution in https://github.com/matomo-org/matomo/issues/18448#issuecomment-1512338767
  • Ideally we would backport but it might be OK in that case not to backport

Summary

This is continuation of #16361 and the related discussion in the forums. Right now, the stance of Matomo is that in cookie-less mode, it doesn't use fingerprinting so it doesn't require consent. Under GDPR this line of thought is perfectly sound and reasonable. However, under the ePrivacy directive (which is now implemented in German law since 1 December, 2021 with the TTDSG), this is once again a debatable point.

The key phrase in the ePrivacy directive is the 'access to information' already stored on the user's device. Of course, it's not entirely clear what this means in practice. But call it a fingerprint or not, Matomo does access some data from the user's device to create the config_id (in particular, the screen size and supported mime types). It's really up for debate if this constitutes 'access to information' – but as long as there's no legal precedence, some clients will want to cover all their bases. So it will once again be necessary to require consent before using Matomo, even without cookies or 'real' fingerprinting in place. So having an option to disable the device detection further would be great. Instead, the config_id could be based solely on User-Agent and anonymised IP address (which the server receives implicitly, so no data access is required).

Of course, this would further limit the usefulnes of some reports. But this would be acceptable in order to be 100% sure that using Matomo without consent is completely safe, legally speaking.

MoritzLost avatar Dec 03 '21 11:12 MoritzLost

Thanks for this @MoritzLost

When this new feature is disabled, what would you expect to happen? What information would be accessed vs not accessed? Or would it only use the IP address (meaning when different people in the same company visit the site then all these people are group into the same "visit")? Or maybe even every single action would create a new visit and not even the IP looked at?

Be great to hear your thought we may consider to work on this.

The biggest problem would be to customise the tracking code based on this setting which could be quite hard.

tsteur avatar Dec 05 '21 22:12 tsteur

The key phrase in the ePrivacy directive is the 'access to information' already stored on the user's device. O

Do you maybe have a link to the page where this is mentioned?

tsteur avatar Dec 05 '21 22:12 tsteur

@tsteur Thanks for the reply!

Do you maybe have a link to the page where this is mentioned?

This comment on issue #16361 has more information:

If Art. 5 para. 3 ePrivacy Directive applies consent is mandatory to proceed. This law does not refer to cookies. It refers to "the gaining of access to information already stored, in the terminal equipment of a subscriber or user". The most relevant question is whether Javascript Tracking means gaining access to information already stored in the enduser's device. Art. 5 para. 3 ePrivacy Directive describes one scenario when consent is not required. This is the case if access to the enduser's device is "strictly necessary in order for the provider of an information society service explicitly requested by the subscriber or user to provide the service". This exemption does not cover analytics because no user or visitor "explicitly requests" to analyze his website or app usage. The relevant publication regarding this matter is Opinion 09/2014 by the Article 29 Working Party on device fingerprinting. The Opinion states under 7.1: "first-party website analytics through device fingerprinting do not fall under the exemption defined in CRITERION A or B and consent of the user is required."

In the TTDSG, the German implementation of the ePrivacy directive, there's § 25 para. 1, which is pretty much a literal translation of Art. 5 para. 3 ePrivacy Directive.

When this new feature is disabled, what would you expect to happen? What information would be accessed vs not accessed?

It's a bit difficult to say what information can still be used without consent based on the ePrivacy Directive / TTDSG, and I'm a developer, not a lawyer. But I've read multiple blog post and 'expert opinions' (for example this one in German). The consensus seems to be that everything that is implicitly available to the server as by-product of the HTTP request is 'safe' to access. This would be, at most, the IP address and User-Agent. Though not even that is completely certain, since the request to Matomo was not 'explicitly requested' by the user, so it's not 'strictly necessary' (quoting the relevant phrases from the ePrivacy Directive here).

That said, I'm more concerned with the data Matomo collects using JS (screen size, mime types). Of course there's much ambiguity here, but that could be considered 'access to information already stored' on the user's device. So an option to disable that would ideally leave that part out entirely, so the server only gets the IP Address and User-Agent (and any other 'required' HTTP headers that might be useful) to create the config_id.

Or would it only use the IP address (meaning when different people in the same company visit the site then all these people are group into the same "visit")? Or maybe even every single action would create a new visit and not even the IP looked at?

Either of those could be reasonable. If the config_id is created only based on User-Agent and IP address, this would lead to a lot of collisions, in particular if the IP addresses are anonymized (which they need to be for GDPR reasons). Though it could still work well enough for small and medium sites.

Maybe the distinction between visitor, visit and page view could be dropped entirely in this case? This would greatly reduce the usefulness of some reports (bounce rate, session length etc), but that would be a tradeoff to consider. Cloudflare Web Analytics does something similar. They only distinguish between visits and page views. A unique 'visit' is recorded if the web request came from a different website (determined by the HTTP Referer header). Of course this leaves a wide margin for error, but it's still accurate enough to get basic visitor counts and page view trends. Though I'm aware that this would be a massive rework.

As another option, disabling some reports that just don't work if the config_id is not reliable would make sense. Or even just adding warnings to the interface where reports might be skewed by that fact? Based on my interactions with clients, having reduced accuracy of reports is acceptable if it means we can use Matomo without requiring consent. Those clients only need to be able to tell which reports are reliable in that mode and which are not.

Getting some opinions from other people here would be great, in general the situation is pretty murky right now. Maybe the best course of action right now is to wait for some legal precedence / test cases which will clear up what consitutes 'access to information' …

MoritzLost avatar Dec 06 '21 09:12 MoritzLost

@MoritzLost thanks for this. I will have another read and think later. Will see if we can maybe get more eyes on this through the forum or so.

Just wanted to already mention that we're working on a feature to disable some things when data is less accurate when we don't have reliable config_id etc see https://github.com/matomo-org/matomo/pull/16773

tsteur avatar Dec 06 '21 19:12 tsteur

Also just FYI as a "workaround" someone could technically fall back to log analytics (Log analytics on GitHub) if eg events or other features aren't being used. Of course that might not always be the case and maybe "Opt out" could be partially problematic.

tsteur avatar Dec 06 '21 19:12 tsteur

I guess generally it be mostly about disabling this code to run Matomo in a way to not use this data for the fingerprint if someone wanted. The most unique thing in there is likely the resolution as plugins are often quite similar. So disabling this would actually not make a huge difference. Generally, we could provide a tracker option likely to not send this data along with a tracking request (and ideally also to not even access it which should be the case automatically when not using cookies or cross domain linking feature which won't be useful without cookies anyway as far as I remember).

image

FYI I've contacted our data protection officer to potentially get some insights into this. It may take a while until we hear back.

tsteur avatar Dec 06 '21 20:12 tsteur

@tsteur Thanks! Yeah, would be great to get a some more opinions for this.

Just wanted to already mention that we're working on a feature to disable some things when data is less accurate when we don't have reliable config_id etc see #16773

That sounds promising, maybe this would pave the way to disable the feature detection completely without messing up the reports too much. Having reduced accuracy is very acceptable if the interface clearly communicates what it entails, like hiding reports that don't work in this case.

Also just FYI as a "workaround" someone could technically fall back to log analytics (Log analytics on GitHub) if eg events or other features aren't being used. Of course that might not always be the case and maybe "Opt out" could be partially problematic.

Yeah, log analytics is always the final fallback option, it doesn't require consent or opt-out at all (as long as IP addresses are anonymized). Though they're just not as reliable as JS-based trackers.

I guess generally it be mostly about disabling this code to run Matomo in a way to not use this data for the fingerprint if someone wanted. The most unique thing in there is likely the resolution as plugins are often quite similar. So disabling this would actually not make a huge difference. Generally, we could provide a tracker option likely to not send this data along with a tracking request (and ideally also to not even access it which should be the case automatically when not using cookies or cross domain linking feature which won't be useful without cookies anyway as far as I remember).

Yeah, the browser feature detection is definitely the main point. An option that just skips calling this function would be great, this should clear up any doubts regarding the 'access to information' clause. Or _ maybe_ only the screen size could be used? Arguably, the screen size does not constitute 'information to access already stored'. Though that's still up for debate, it really comes down to legal precedence I guess …

FYI I've contacted our data protection officer to potentially get some insights into this. It may take a while until we hear back.

Thanks, would be great to get an expert opinion on this specific use-case!

MoritzLost avatar Dec 07 '21 10:12 MoritzLost

Just fyi @MoritzLost I read https://www.heise.de/hintergrund/Was-sich-mit-den-neuen-Cookie-Regelungen-aendert-6278440.html today and it sounded more like that maybe consent is needed when you previously store data, and then read this data. As I think what's maybe meant by it if you don't store a cookie, but you store an identifier in localStorage or sessionStorage or so and want to read this. I don't think what's meant by it is the resolution or plugin features. I'm still waiting though to get more feedback likely mid next week.

tsteur avatar Dec 08 '21 20:12 tsteur

@tsteur I'm not sure I find that article convincing – focussing the whole discussion on cookies (or related technologies like localStorage) is a very myopic reading of the terms of the ePrivacy directive. The article basically states that the directive (and the TTDSG by extension) only talk about accessing information that you put there yourself – so cookies etc. But that isn't supported by the wording in the ePrivacy directive / TTDSG at all, which only mention access to information already stored [on the user's device], without mentioning what kinds of information this applies to. So it comes down to whether device features (screensize, plugins etc) can be considered 'information' in this sense.

Anyway, I would really like to believe that article, since it would make my life much easier … but I'm not sure that this interpretation is supported by the actual text of the law. Anyway, I'm really out of my depth here.

MoritzLost avatar Dec 09 '21 11:12 MoritzLost

Hello, I haven't read this whole thread in detail, but here are my findings so far looking at the latest Eprivacy draft especially Article 8: https://github.com/matomo-org/matomo/issues/15425#issuecomment-993160031 - if you have any feedback or questions i'm keen to hear :key: I'm not sure yet how Article 5 and Article 8 cohabit or interact.

mattab avatar Dec 14 '21 05:12 mattab

Thanks to @MoritzLost and @tsteur for the valuable input. I agree that this issue is completely arguable.

One question is whether the term "information" in Article 5 ePrivacy Directive and § 25 TTDSG covers all kind of data or needs to be read in a limited meaning like "information that has been created by human action". The consequence of this legal argument could be whether screen size and similar technical information are covered by the law or not.

The second question is whether the term "stored" means stored by anybody or by the person/organisation which wants to process the information. For example the applications and fonts installed on a device are information stored by human action on a device (and by some services they are used to create a fingerprint). But unlike cookies or data in the local storage of a browser storing such information has not been initiated by the external party running the analytics tool.

For Germany the Datenschutzkonferenz (DSK; the common body of the regional and federal supervisory authorities) has announced to publish an update to its guideline on web analytics. It should have been published already. By now it seems more likely to be published in February. But most likely the DSK will address the questions raised here. In general the DSK supports the approach by Matomo but their lawyers need to find common sense in interpreting the law.

Daten-David avatar Dec 14 '21 09:12 Daten-David

@MoritzLost here some update:

Even though there has been a lot of talk about sec. 25 of the new German Telemedia and Telecommunications Privacy Act (TTDSG), it is really only a (very late) implementation of the 2009 EU ePrivacy Directive, in this case its art. 5(3), into German law. It's nothing new basically. Meaning when we look at TTDSG we pretty much have to look at ePrivacy.

The consent requirement in ePrivacy Directive was written in a time when cookies were the main technology to track users online, but today, the 2009 law is interpreted to apply that all tracking technologies and not just to cookies.

However, there is an exception from the consent requirement in art. 5(3)(2) ePrivacy Directive (and in sec. 25(2)(2) TTDSG) for so-called "strictly necessary" tracking technologies, which therefore do not require consent. The French data protection authority CNIL has taken the position that this exception also applies to "lean" analytics providers (such as Matomo). This means that the consent requirement of art. 5(3)(1) ePrivacy Directive (and sec. 25(1) TTDSG) does not apply to Matomo because its tracking technology is considered "strictly necessary". Unfortunately, the German data protection authorities have remained silent on this very question, but it could be relied on the CNIL interpretation until further notice.

In the same spirit you could also rely on CNIL's decision and technically even use cookies without consent as long as you follow their conditions and don't track personal data (as then GDPR applies).

This is our interpretation here. Don't take it as legal advice.

Nonetheless, I think for people who want to have even stronger privacy, I think it be great to add a tracker method to disable the usage of browser features for config_id/fingerprint. It's quick to do and for low traffic sites it shouldn't have a huge impact.

tsteur avatar Dec 14 '21 19:12 tsteur

@tsteur I'd love to live in France 😏

However, there is an exception from the consent requirement in art. 5(3)(2) ePrivacy Directive (and in sec. 25(2)(2) TTDSG) for so-called "strictly necessary" tracking technologies, which therefore do not require consent.

Not to beat a dead horse, but the strictly necessary part is limited to a 'service explicitly requested by the subscriber or user to provide the service' (art (5)(3) ePrivacy). I'm not sure how to go from there to the CNIL statement – if a visitor visits a website, Matomo isn't 'strictly necessary' to provide the webpage. And the request to Matomo itself hasn't been 'explicitly requested'. Maybe it can be argued that basic analytics/visitor statistics are strictly necessary for ongoing support and development of a website?

Anyway, having an option like this would be great to be able to cater to extremely careful clients.

MoritzLost avatar Dec 15 '21 15:12 MoritzLost

@MoritzLost check out 50-52 https://www.cnil.fr/sites/default/files/atoms/files/lignes_directrices_de_la_cnil_sur_les_cookies_et_autres_traceurs.pdf

Translating this it says that it considers the use of attendance/performance statistics as required. That's if you eg don't also use the data for other purposes etc. Hence they exempt that you need to ask for any consent when you use Matomo in a certain way. Since they all implement the same European directive into national law you can argue the same applies to Germany or other countries until they take a different stance. This is after chatting with some expert lawyers etc. Someone might see this different though.

If you want to not look at any device data, then the above tracker method may help 👍

tsteur avatar Dec 15 '21 19:12 tsteur

@tsteur and @MoritzLost

I don't know whether it helps but I am pretty convinced both of you are right. CNIL (in accordance with many lawyers) adopted a flexible approach for interpretation of Art 5 para 3 ePrivacy Directive (same as German § 25 TTDSG). Probably as many lawyers don't go with this interpretation.

Until the ECJ (European Court of Justice) had to decide on this issue nobody will legally know for sure. As far as I am aware there is no case pending on this matter already at the ECJ. So it might take at some years till we know for sure. Probably the ePrivacy Regulation as a replacement for ePrivacy Directive will be enacted at about the same time.

What will happen first will be the publication by the German authorities (and most likely by other national data protection authorities) of their guidelines. If the German guideline agrees with the French one we are all one major step ahead. If they don't agree...

If Matomo wants to step out of legal uncertainty it could go forward and limit access to all device data which is not sent into server-side analytics anyway. I am not aware of how much this move is going to limit the quality of the analytics results. Hence I don't know how painful such move might be.

Daten-David avatar Dec 15 '21 22:12 Daten-David

@Daten-David 👍 As part of this issue we'll add a new method that gives people the option to not detect these browser features and send it to the server. For smaller sites this shouldn't have too much of an impact typically (unless you get a lot of traffic from different people from the same company for example). For higher traffic ones there's maybe more often that different visitors may be grouped into the same visit but it shouldn't make that much of a difference I would expect.

tsteur avatar Dec 15 '21 22:12 tsteur

@tsteur The DSK (meeting of the German supervisory authorities) published its guideline on § 25 TTDSG yesterday.

In German: https://www.datenschutzkonferenz-online.de/media/oh/20211220_oh_telemedien.pdf

An English version has been announced.

On page 8 the DSK points out that fingerprinting is considered access to information stored on the device and hence § 25 TTDSG is applicable to fingerprinting. The DSK doesn't provide a lot of arguments for this view but simply refers back to the old paper on device fingerprinting by the Article 29 Working Party which came to the same conclusion.

Different to the French view by CNIL the DSK doesn't accept web (or app) analytics as "strictly necessary" to provide a "service requested by the enduser". Hence consent is mandatory.

Looks like there is no way to keep fingerprinting via javascript active without collecting consent first.

And collecting consent doesn't become easier if you follow the guideline. I won't go into details but the guideline sounds very much like no consent management tool known to me today is capable of collecting valid consent.

The whole issue stays hot. And it looks like a big u-turn back to the 90ies of pure server-side web analytics.

Daten-David avatar Dec 22 '21 15:12 Daten-David

In the meantime, so-called browser fingerprinting is also often used Mission. This refers to the process of forming a server-side as possibleunique and long-lived (hash) value or image as a result of a mathematical calculation of browser information, such as Screen resolutions, operating system versions or installed fonts.

put one section through a translation service so other people can understand as well. FYI above one talks about a long-lived (hash) value. In Matomo, this hash is not long-lived and changes max every 24 hours. If visitor visits the site at 8am, then 16 hours later the hash changes already again. You could maybe argue that because the hash is not long lived it's not considered a fingerprint and it may be fine. Everyone will see this differently though. I know it talks further down more generic about it.

I'll move this issue into the current sprint and not next sprint so we can offer such a method to not access this data sooner.

tsteur avatar Dec 22 '21 20:12 tsteur

Yesterday I missed an additional announcement by the DSK in its press release regarding the new guideline. DSK announced to initiate a public consultation on its guideline. Details will follow. See last paragraph of press release (in German): https://www.datenschutz-berlin.de/fileadmin/user_upload/pdf/publikationen/DSK/2021/2021-DSK-PM-OH_Telemedien.pdf It might be helpful if Matomo could provide arguments into the consultation which pick up the positive view of the CNIL. There might still be a chance that following the consultation the German DSK might swing to the more liberal CNIL view.

Daten-David avatar Dec 23 '21 10:12 Daten-David

Great, thanks a lot for pointing this out @Daten-David . We'd be keen to follow up there. In case we miss it, be great to mention it if you hear more about the details.

tsteur avatar Dec 23 '21 18:12 tsteur

As a user, this is all very confusing. I have a main server in France from a French company that proxies Matomo requests to a analytics server in the UK which hosts the Matomo software. Some of our users are from Germany. So do I have to worry about Germany privacy law interpretations on fingerprinting applying, even though I am in the UK and the server is in France? 🤷🏻‍♂️

GreenReaper avatar Dec 31 '21 04:12 GreenReaper

@peterhashair it'd be good as part of this issue to still add the test that browser features aren't sent to the server after the new method is called and a tracking request fired.

Also the documentation updates (at least two docs, I think maybe you've already added one).

Thanks

justinvelluppillai avatar Jan 17 '22 03:01 justinvelluppillai

And it be great to allow users to call a method enableBrowserFeatureDetection.

fyi @GreenReaper @Daten-David @MoritzLost an FAQ has been created for this new feature that will be included in Matomo 4.7

https://matomo.org/faq/how-do-i-disable-browser-feature-detection-completely/

@peterhashair fyi I tweaked the FAQ to make a few things more clear.

tsteur avatar Jan 17 '22 03:01 tsteur

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/compliance-adoption-to-the-requirements-of-the-25-ttdsg-german-law-option-to-avoid-the-use-of-screen-resolution-params-for-a-website-operation-without-consent-banner/46061/2

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/question-from-data-protection-officer/47475/8

I'd like to continue the thread started by @MoritzLost. I definitely see his point because it's been a source of discussion and debate within our organization. While Matomo claims that one can run it without cookie consent, it is definitely not true in all EU member states because of article 5(3) of the ePrivacy directive that states the following: "Member States shall ensure that the storing of information, or the gaining of access to information already stored, in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned has given his or her consent, having been provided with clear and comprehensive information, in accordance with Directive 95/46/EC....". To generate its config_id, Matomo does access information from the user_agent string (IP, browser family, version, etc.) and some data privacy authorities (UK, Belgium, Austria, etc.) consider this a breach of article 5(3) of the ePrivacy directive because it's gathered without the user consent. It'd be great for Matomo to introduce a way to generate its config_id without collecting any of the device attribute (like using a random or unique number). I know it'd screw up some metrics like the number of visitors and other but it's a trade off. I can see this topic being a threat to Matomo. The two reasons we introduced Matomo in the organization was that it was privacy centric and could be ran without cookie consent. Given the later is not true in all member states, our IT team is now exploring alternatives, incl. CloudFlare analytics and other platforms.

jmbiltresse avatar Mar 28 '23 07:03 jmbiltresse

@jmbiltresse Have you looked at the new enableBrowserFeatureDetection option? This disables the fingerprinting completely, so there shouldn't be a problem. We use this alongside disableCookies, this way Matomo neither sets cookies nor accesses existing data / device information, so we can run it without consent because no PII is collected.

MoritzLost avatar Mar 28 '23 07:03 MoritzLost

@MoritzLost I do not believe the disableBrowserFeatureDetection option disables the fingerprinting completely. My understanding is that it will stop collecting the browser resolution and browser plugins to make the config_id but will still need information from the user agent string like the IP, browser version and family to compute the config_id. Some data privacy authorities consider that accessing information from the user_agent string for the sake of analytics is a breach of Article 5(3). I know the disableBrowserFeatureDetection option satisfies the data privacy authority in Germany but likely not the Belgian, UK and Austrian ones. Let me know if I've misundertood you. Thanks!

jmbiltresse avatar Mar 28 '23 07:03 jmbiltresse

Adding @tsteur for visibility and possible feedback. Thanks!

jmbiltresse avatar Mar 28 '23 17:03 jmbiltresse

disableBrowserFeatureDetection disables the browser features indeed. It won't detect any plugins or resolution and won't use it in the fingerprint. Matomo then still uses the user agent that is sent along the request to build the config_id, but because this data is sent with the request and not read on the user's device, this should not be causing any issues. Happy to discuss further.

tsteur avatar Mar 28 '23 19:03 tsteur