opentelemetry.io icon indicating copy to clipboard operation
opentelemetry.io copied to clipboard

Baggage

Open truekonrads opened this issue 3 years ago • 9 comments

The documentation on Baggage provides examples of "non-sensitive data that you’re okay with potentially exposing to third parties", such as:

  • Account Identification;
  • User Ids;
  • Product Ids;and
  • origin IPs, Under GDPR regulation, the user information is likely to be considered as personal information and should not be shared with third parties without express permission from the user.

Suggestion: rewrite this example.

truekonrads avatar Aug 06 '22 04:08 truekonrads

Hmm, I'm not so sure. My rough reading online is that a user ID made up of non-PII is not necessarily considered PII.

What would you recommend, given the context that Baggage is information that could be exposed to third-parties since it sits in HTTP headers?

cartermp avatar Aug 07 '22 01:08 cartermp

a user ID made up of non-PII is not necessarily considered PII.

I would assume that is rarely the case? Might be true if one uses "roles as user" like "admin" or something, but with GDPR even a pseudonym or a pseudonymization (hashing) is problematic quickly. I had a great paper on that topic a few months back, let me see if I can find it

Independent of that discussion, I would agree with @truekonrads that providing some examples that are non-potential-PII might be better (and also give people some idea what they could use instead of ips/user ids, etc.):

  • User Segmentation (loyalty group, service level groups)
  • SaaS Tenant
  • high level Geo Data (Country, State, City) or specific data relevant for your company (office, store location, ...)
  • product / booking details (product ids, flight details (departure & destination, maybe not the seat number..)
  • any other information that might influence the behavior of a down stream service and with that lead to performance issues.

@truekonrads are you open to provide a PR with a different list of examples?

svrnm avatar Aug 08 '22 09:08 svrnm

I would assume that is rarely the case?

This is one example I was referring to: https://developers.google.com/analytics/solutions/crm-integration#user_id

cartermp avatar Aug 08 '22 12:08 cartermp

Thanks for sharing! Coming from GDPR-land and having my share of customer conversations on that topic, I might be overthinking that: "non obfuscated alphanumeric database identifiers" or "encrypted identifier that is based on PII" are both pseudonymised datat that can become PII data when combined with other data (e.g. a data breach of the database holding those identifiers), so I personally prefer not recommending collecting that kind of data and provide alternatives, that might be good enough (like a way to identify a cohort a user belongs to that helps to troubleshoot your performance issue)

Here's the document I was referring to (a great good night read if you have trouble sleeping) https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf

A specific pitfall is to consider pseudonymised data to be equivalent to anonymised data. The Technical Analysis section will explain that pseudonymised data cannot be equated to anonymised information as they continue to allow an individual data subject to be singled out and linkable across different data sets. Pseudonymity is likely to allow for identifiability, and therefore stays inside the scope of the legal regime of data protection

svrnm avatar Aug 08 '22 16:08 svrnm

I wanted to also chip in that logging session IDs raw is not a good idea (hashed could be a suitable). Whoever views the logs can set the session ID in a cookie and work as the user. In no way an advert for the product, but Google Cloud DLP transformations which does data masking/format preserving encryption could be a good processor.

truekonrads avatar Aug 09 '22 03:08 truekonrads

Baggage is not the only place affected by this kind of "please handle with care" situation and it's not the right place to answer all those questions.

In the spec we have a call out for identity attributes:

Given the sensitive nature of this information, SDKs and exporters SHOULD drop these attributes by default and then provide a configuration parameter to turn on retention for use cases where the information is required and would not violate any policies or regulations.

So, here's my proposal:

  • have some more non-PII examples for baggage
  • keep some of the PII (or potentially PIIs) and add a note that this data needs to be handled with care.

@truekonrads would you be open to provide a PR with changes?

svrnm avatar Aug 09 '22 10:08 svrnm

I think it's also important that examples of this kind of information are specifically called out (like they are today) in the baggage doc because they sit in HTTP headers rather than the body, making them eminently more "sniffable". Words and examples can change of course, but I want to preserve that dynamic in this doc specifically since it's a unique dynamic compared to attaching similar fields on attributes for spans/logs/span events/span links

cartermp avatar Aug 09 '22 15:08 cartermp

@svrnm I'll give it a try

truekonrads avatar Aug 11 '22 12:08 truekonrads

thanks @truekonrads

svrnm avatar Aug 16 '22 17:08 svrnm