website-evidence-collector
website-evidence-collector copied to clipboard
Does not detect Etag-tracking
Reproduction
website-evidence-collector --no-output --quiet --yaml https://lucb1e.com/rp/cookielesscookies/
beacons: []
https://eur-lex.europa.eu/eli/reg/2016/679/oj According to GDPR, personal data is any information relating to an identifiable person, who can be identified by reference to location data or online identifier.
HTTP Etag is a method of direct server-side tracking without prior consent.
Does not detect Redirect-tracking, too. See the Tracking demo on http://test.noleaks.eu/
Dear @noleakseu,
thank you for raising the topic of HTTP Etags.
Etags are produced by the web server and shall be a property of a resource, such as an image or HTML file, to identify it uniquely for the purpose of detecting efficiently when browser cache is outdated.
Further information on Etags: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag
Relevant RFC standard: https://tools.ietf.org/html/rfc7232#section-2.3
Web servers can repurpose such Etags to make them a property of the website visitor for the purpose of their identification. In this case, they are not cookies, but a similar technology. The UK data protection authority recently published guidance on the meaning of similar technologies.
The problem is here to understand how web servers employ etags. One option could be to load resources twice and check if etags have changed. If the content is the same and the etag has changed, than etags may have been repurposed. This means there is a performance trade-off to make. See also: #15.
Would Etags of all resources need to be checked? Is there another more efficient method to detect the identification of website users by means of Etags?
Please open a separate issue for redirect tracking. :)
Thank you for feedback!
Personal data is any information relating to an identifiable person, who can be identified by online identifier, regardless of the technology. At the moment WEC is unable to collect evidence of Etag-, Redirect- and HSTS-driven tracking. Yes, any tool or tech has its own limitations. Since heuristic detection might be out of scope of WEC, a Limitation section in README.md would be enough to avoid misunderstandings, excessive expectations. And enough to close the issue. :)
Can you please confirm whether the evidence on etags is recorded in the file output/requests.har
?
Finally, I've released a tool that detects manipulations with ETags and free from Puppeteer issues. The idea is to compare responses in different modes - the first, returning and incognito. A "bad guy" will set different ETag headers pointing to the same content. The inspection code: https://github.com/noleakseu/notary/blob/main/src/main/java/notary/EtagInspection.java Сriticism and openness rule the development. :)