oteps
oteps copied to clipboard
Sensitive Data Handling
I present a proposal for a design philosophy around handling potentially-sensitive data in our libraries, using SQL as an example throughout.
Overall great. This feels like a good start 👍
In general, the use of "we" here is confusing. Is this describing the approach you've taken at your company, or retroactively describing what you would like to see OpenTelemetry do?
In general, the use of "we" here is confusing. Is this describing the approach you've taken at your company, or retroactively describing what you would like to see OpenTelemetry do?
I was following the recommendation in the template: "Explain the proposed change as though it was already implemented and you were explaining it to a user" and, yes, using "we" to mean "the whole OpenTelemetry community".
@yurishkuro +1 I think we should promise that we have it in the collector, but not in all the libraries.
I hear and accept the concern around the cost of owning this in each language. I will rework the proposal so that the default system (including the collector) preserves the desired behavior, allowing but not requiring instrumentation libraries to do their own scrubbing. One design concern I have for this, though, is that the collector loses some semantic information that the instrumented process has. For example, we currently use db.statement
for "plain" sql and also Mongo, Redis, Geode, Couchbase, etc. queries. Instrumentation libraries of course know which thing they're instrumenting and can apply appropriate logic. Under this design, how will the collector know which semantic transformations to apply?
how will the collector know which semantic transformations to apply?
I think that is a matter for semantic data conventions. On of the other db.***
attributes should provide this clarification.
how will the collector know which semantic transformations to apply?
I think that is a matter for semantic data conventions. On of the other
db.***
attributes should provide this clarification.
Exactly. The current spec has a required attribute db.type
, which specifies the type of database being called. This attribute will be passed on to the collector untouched.
+1 that data scrubbing, encryption/sanitization (a la https://docs.honeycomb.io/authentication-and-security/secure-tenancy/ or Lightstep's satellites) should be done on client's premises via a collector or satellite, but not necessarily in-process for every telemetry generating SDK.
@lizthegrey ,
In some scenarios, data scrubbing or data validation would need to be done in the application itself due to organisation policies that don't allow for the data to be processed by a third party application.
(Sorry, I have a similar OTEP that I am trying to advocate for)
@johnbley we are cleaning up stale OTEP PRs. If there is no further action at this time, we will close this PR in one week. Feel free to open it again when it is time to pick it back up.