Feature: provide options for programmatic client-side data sanitization
Is your feature request related to a problem? Please describe.
Currently I instrument an application which talks to an api which takes the auth info via a query parameter called access_token=PASSWORD this gets transported to elastic search and stored there for everyone to see.
Describe the solution you'd like
Let me replace parts of the data that get's send to elasticsearch with stars (*)
So instead of seeing https://url.tld?access_token=<SECRET_TOKEN> I would see https://url.tld?access_token=*******
This could be achieved by providing an API along the lines of the Python agent's @for_events decorator, or the Node.js agent's filter API.
Alternatively we could provide a more module-specific options. e.g. an apmhttp option which accepts a function that controls what is recorded for client spans, providing an opportunity to redact the recorded URL.
Have you tried using ELASTIC_APM_SANITIZE_FIELD_NAMES for this?
Please also use github issues only for confirmed bugs and feature requests, and use the APM discuss forum for general questions to make them easier to find for other users.
I looked at the documentation and found it will not do what I want it to do.
A list of patterns to match the names of HTTP headers, cookies, and POST form fields to redact.
And I am talking about the URL.
And it seems the default configuration should pick this up. I even looked for a bit into the code of this module and could not find something manipulating the URL.
ELASTIC_APM_SANITIZE_FIELD_NAMES won't work for this, as it is only relevant to HTTP transactions and not client spans.
The Elastic APM agents take the view that query strings should not contain sensitive data, following the advice of OWASP: https://owasp.org/www-community/vulnerabilities/Information_exposure_through_query_strings_in_url.
Assuming you do not have the option of changing the API you're calling, there are a couple of things you can do to address this:
- customise the "apm" Ingest node pipeline to modify the document before it is indexed: https://www.elastic.co/guide/en/kibana/current/ingest-node-pipelines.html
- add your own client middleware which temporarily redacts the query string, prior to the apmhttp client middleware being
FYI the URL is recorded by this call: https://github.com/elastic/apm-agent-go/blob/580656d8fb774bd4d6afd81961da13feb80f0e01/module/apmhttp/client.go#L114
After some digging I found that the facebook api actually supports receiving the access tokens via headers.
It is just not in their documentation and it looks like everywhere they want to receive it via query parameter.
Still I believe that for example the python elastic apm library supports this and I feel the go elastic apm should support this feature.
Still I believe that for example the python elastic apm library supports this and I feel the go elastic apm should support this feature.
That's fair. I've held off adding something like this to the Go agent in the past, because:
- the high-level types (apm.Transaction, apm.Span, etc.) are effectively write-only by design, to minimise allocations and keep the overhead low
- the lower-level "model" types have been considered an unstable API, hence not exposed to users
We could introduce an API along the lines of of the Python one that operates on the model types, but mark it as unstable. In practice I expect breaking changes will be rare, but it may still happen. I'll modify the issue title and description so we can track this.