oteps icon indicating copy to clipboard operation
oteps copied to clipboard

Sensitive Data Handling

Open johnbley opened this issue 4 years ago • 10 comments

I present a proposal for a design philosophy around handling potentially-sensitive data in our libraries, using SQL as an example throughout.

johnbley avatar Apr 27 '20 23:04 johnbley

Overall great. This feels like a good start 👍

carlosalberto avatar Apr 28 '20 13:04 carlosalberto

In general, the use of "we" here is confusing. Is this describing the approach you've taken at your company, or retroactively describing what you would like to see OpenTelemetry do?

dyladan avatar Apr 28 '20 16:04 dyladan

In general, the use of "we" here is confusing. Is this describing the approach you've taken at your company, or retroactively describing what you would like to see OpenTelemetry do?

I was following the recommendation in the template: "Explain the proposed change as though it was already implemented and you were explaining it to a user" and, yes, using "we" to mean "the whole OpenTelemetry community".

johnbley avatar Apr 28 '20 22:04 johnbley

@yurishkuro +1 I think we should promise that we have it in the collector, but not in all the libraries.

bogdandrutu avatar Apr 30 '20 01:04 bogdandrutu

I hear and accept the concern around the cost of owning this in each language. I will rework the proposal so that the default system (including the collector) preserves the desired behavior, allowing but not requiring instrumentation libraries to do their own scrubbing. One design concern I have for this, though, is that the collector loses some semantic information that the instrumented process has. For example, we currently use db.statement for "plain" sql and also Mongo, Redis, Geode, Couchbase, etc. queries. Instrumentation libraries of course know which thing they're instrumenting and can apply appropriate logic. Under this design, how will the collector know which semantic transformations to apply?

johnbley avatar May 01 '20 12:05 johnbley

how will the collector know which semantic transformations to apply?

I think that is a matter for semantic data conventions. On of the other db.*** attributes should provide this clarification.

yurishkuro avatar May 02 '20 01:05 yurishkuro

how will the collector know which semantic transformations to apply?

I think that is a matter for semantic data conventions. On of the other db.*** attributes should provide this clarification.

Exactly. The current spec has a required attribute db.type, which specifies the type of database being called. This attribute will be passed on to the collector untouched.

arminru avatar May 04 '20 14:05 arminru

+1 that data scrubbing, encryption/sanitization (a la https://docs.honeycomb.io/authentication-and-security/secure-tenancy/ or Lightstep's satellites) should be done on client's premises via a collector or satellite, but not necessarily in-process for every telemetry generating SDK.

lizthegrey avatar May 05 '20 18:05 lizthegrey

@lizthegrey ,

In some scenarios, data scrubbing or data validation would need to be done in the application itself due to organisation policies that don't allow for the data to be processed by a third party application.

(Sorry, I have a similar OTEP that I am trying to advocate for)

MovieStoreGuy avatar Dec 02 '21 22:12 MovieStoreGuy

@johnbley we are cleaning up stale OTEP PRs. If there is no further action at this time, we will close this PR in one week. Feel free to open it again when it is time to pick it back up.

tedsuo avatar Jul 31 '23 16:07 tedsuo