apm-agent-go icon indicating copy to clipboard operation
apm-agent-go copied to clipboard

Feature: provide options for programmatic client-side data sanitization

Open phumberdroz opened this issue 4 years ago • 5 comments

Is your feature request related to a problem? Please describe. Currently I instrument an application which talks to an api which takes the auth info via a query parameter called access_token=PASSWORD this gets transported to elastic search and stored there for everyone to see.

Describe the solution you'd like Let me replace parts of the data that get's send to elasticsearch with stars (*) So instead of seeing https://url.tld?access_token=<SECRET_TOKEN> I would see https://url.tld?access_token=*******

This could be achieved by providing an API along the lines of the Python agent's @for_events decorator, or the Node.js agent's filter API.

Alternatively we could provide a more module-specific options. e.g. an apmhttp option which accepts a function that controls what is recorded for client spans, providing an opportunity to redact the recorded URL.

phumberdroz avatar Jan 11 '21 09:01 phumberdroz

Have you tried using ELASTIC_APM_SANITIZE_FIELD_NAMES for this?

Please also use github issues only for confirmed bugs and feature requests, and use the APM discuss forum for general questions to make them easier to find for other users.

simitt avatar Jan 11 '21 10:01 simitt

I looked at the documentation and found it will not do what I want it to do.

A list of patterns to match the names of HTTP headers, cookies, and POST form fields to redact.

And I am talking about the URL.

And it seems the default configuration should pick this up. I even looked for a bit into the code of this module and could not find something manipulating the URL.

phumberdroz avatar Jan 11 '21 10:01 phumberdroz

ELASTIC_APM_SANITIZE_FIELD_NAMES won't work for this, as it is only relevant to HTTP transactions and not client spans.

The Elastic APM agents take the view that query strings should not contain sensitive data, following the advice of OWASP: https://owasp.org/www-community/vulnerabilities/Information_exposure_through_query_strings_in_url.

Assuming you do not have the option of changing the API you're calling, there are a couple of things you can do to address this:

  • customise the "apm" Ingest node pipeline to modify the document before it is indexed: https://www.elastic.co/guide/en/kibana/current/ingest-node-pipelines.html
  • add your own client middleware which temporarily redacts the query string, prior to the apmhttp client middleware being

FYI the URL is recorded by this call: https://github.com/elastic/apm-agent-go/blob/580656d8fb774bd4d6afd81961da13feb80f0e01/module/apmhttp/client.go#L114

axw avatar Jan 11 '21 10:01 axw

After some digging I found that the facebook api actually supports receiving the access tokens via headers.

It is just not in their documentation and it looks like everywhere they want to receive it via query parameter.

Still I believe that for example the python elastic apm library supports this and I feel the go elastic apm should support this feature.

phumberdroz avatar Jan 11 '21 12:01 phumberdroz

Still I believe that for example the python elastic apm library supports this and I feel the go elastic apm should support this feature.

That's fair. I've held off adding something like this to the Go agent in the past, because:

  • the high-level types (apm.Transaction, apm.Span, etc.) are effectively write-only by design, to minimise allocations and keep the overhead low
  • the lower-level "model" types have been considered an unstable API, hence not exposed to users

We could introduce an API along the lines of of the Python one that operates on the model types, but mark it as unstable. In practice I expect breaking changes will be rare, but it may still happen. I'll modify the issue title and description so we can track this.

axw avatar Jan 12 '21 04:01 axw