Baggage
The documentation on Baggage provides examples of "non-sensitive data that you’re okay with potentially exposing to third parties", such as:
- Account Identification;
- User Ids;
- Product Ids;and
- origin IPs, Under GDPR regulation, the user information is likely to be considered as personal information and should not be shared with third parties without express permission from the user.
Suggestion: rewrite this example.
Hmm, I'm not so sure. My rough reading online is that a user ID made up of non-PII is not necessarily considered PII.
What would you recommend, given the context that Baggage is information that could be exposed to third-parties since it sits in HTTP headers?
a user ID made up of non-PII is not necessarily considered PII.
I would assume that is rarely the case? Might be true if one uses "roles as user" like "admin" or something, but with GDPR even a pseudonym or a pseudonymization (hashing) is problematic quickly. I had a great paper on that topic a few months back, let me see if I can find it
Independent of that discussion, I would agree with @truekonrads that providing some examples that are non-potential-PII might be better (and also give people some idea what they could use instead of ips/user ids, etc.):
- User Segmentation (loyalty group, service level groups)
- SaaS Tenant
- high level Geo Data (Country, State, City) or specific data relevant for your company (office, store location, ...)
- product / booking details (product ids, flight details (departure & destination, maybe not the seat number..)
- any other information that might influence the behavior of a down stream service and with that lead to performance issues.
@truekonrads are you open to provide a PR with a different list of examples?
I would assume that is rarely the case?
This is one example I was referring to: https://developers.google.com/analytics/solutions/crm-integration#user_id
Thanks for sharing! Coming from GDPR-land and having my share of customer conversations on that topic, I might be overthinking that: "non obfuscated alphanumeric database identifiers" or "encrypted identifier that is based on PII" are both pseudonymised datat that can become PII data when combined with other data (e.g. a data breach of the database holding those identifiers), so I personally prefer not recommending collecting that kind of data and provide alternatives, that might be good enough (like a way to identify a cohort a user belongs to that helps to troubleshoot your performance issue)
Here's the document I was referring to (a great good night read if you have trouble sleeping) https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
A specific pitfall is to consider pseudonymised data to be equivalent to anonymised data. The Technical Analysis section will explain that pseudonymised data cannot be equated to anonymised information as they continue to allow an individual data subject to be singled out and linkable across different data sets. Pseudonymity is likely to allow for identifiability, and therefore stays inside the scope of the legal regime of data protection
I wanted to also chip in that logging session IDs raw is not a good idea (hashed could be a suitable). Whoever views the logs can set the session ID in a cookie and work as the user. In no way an advert for the product, but Google Cloud DLP transformations which does data masking/format preserving encryption could be a good processor.
Baggage is not the only place affected by this kind of "please handle with care" situation and it's not the right place to answer all those questions.
In the spec we have a call out for identity attributes:
Given the sensitive nature of this information, SDKs and exporters SHOULD drop these attributes by default and then provide a configuration parameter to turn on retention for use cases where the information is required and would not violate any policies or regulations.
So, here's my proposal:
- have some more non-PII examples for baggage
- keep some of the PII (or potentially PIIs) and add a note that this data needs to be handled with care.
@truekonrads would you be open to provide a PR with changes?
I think it's also important that examples of this kind of information are specifically called out (like they are today) in the baggage doc because they sit in HTTP headers rather than the body, making them eminently more "sniffable". Words and examples can change of course, but I want to preserve that dynamic in this doc specifically since it's a unique dynamic compared to attaching similar fields on attributes for spans/logs/span events/span links
@svrnm I'll give it a try
thanks @truekonrads