opentelemetry-ruby-contrib icon indicating copy to clipboard operation
opentelemetry-ruby-contrib copied to clipboard

Obfuscate query values in db.statement for mysql2 instrumentation.

Open sribalakumar opened this issue 4 years ago • 3 comments

The current instrumentation for mysql2 is tracing the full query with all the column values which may contain sensitive information and act as an hindrance for adoption in production environment.

https://github.com/open-telemetry/opentelemetry-ruby/blob/master/instrumentation/mysql2/lib/opentelemetry/instrumentation/mysql2/patches/client.rb#L37

def query(sql, options = {})
  tracer.in_span(
    database_span_name(sql),
    attributes: client_attributes.merge(
      'db.statement' => sql
    ),
    kind: :client
  ) do
    super(sql, options)
  end
end

Also Opentelemetry specification mentions that sensitive information can be excluded from the db.statement span attribute.

It will help us in faster adoption if we can follow a similar approach like Newrelic which has an Obfuscation util to mask values from query statements. https://github.com/newrelic/newrelic-ruby-agent/blob/006dd1bb8174e6f49c495c7e1a8ca543de9ceb93/lib/new_relic/agent/database/obfuscator.rb#L57

sribalakumar avatar Aug 05 '20 11:08 sribalakumar

The consensus in OTEP 100 is that this mechanism will be provided in the collector but not necessarily in-process for all languages.

fbogsany avatar Aug 05 '20 13:08 fbogsany

The consensus in OTEP 100 is that this mechanism will be provided in the collector but not necessarily in-process for all languages.

@fbogsany Actually I am not able to find such mentions in the link. Infact the internal details section mentions about having a lexer in the opentelemetry-java-instrumentation which will help to replace sensitive values with ?.

Quoting the internal section reference here:

That said, I have worked on sql normalization at three prior APM companies and am working on contributing a simple first version of one for the opentelemetry-auto-instr-java repo. It is based on using a lexer to parse out sql numeric and string literals and replacing them with a ?, exactly as described above and done by many APM products on the market.

sribalakumar avatar Aug 07 '20 05:08 sribalakumar

See the review comments on that issue to get a better sense of what the community is thinking about here.

fbogsany avatar Aug 10 '20 18:08 fbogsany