elasticsearch-py
elasticsearch-py copied to clipboard
Add OpenTelemetry support
There are currently three ways to instrument an application that uses the Elasticsearch Python client:
- Using the Elastic APM Python agent
- Using the OpenTelemetry contrib elasticsearch-py instrumentation
- For completeness, I suppose using the Elastic Intake API manually is an option.
The main benefit of the existing approaches is that while they can handle past versions of this client, they're fragile to changes in elasticsearch-py, are difficult to test and may be suboptimal as the code may need refactorings to emit spans with complete information. In the case of the OpenTelemetry instrumentations, they don't have clear owners. For this reason, it makes sense to implement this in elasticsearch-py where the feature will be tested and maintained.
We will follow the existing Semantic conventions for Elasticsearch and emit spans for all requests when the opentelemetry-api optional package will be detected. Correctly configuring OpenTelemetry SDK will be left to the user.
The elasticsearch-ruby OpenTelemetry instrumentation will be used as inspiration:
- https://github.com/elastic/elasticsearch-ruby/pull/2179 makes sure to send enough metadata to the transport
- https://github.com/elastic/elastic-transport-ruby/pull/54 performs the actual instrumentation
Configuration
OTEL_PYTHON_INSTRUMENTATION_ELASTICSEARCH_ENABLED(default: true) Enable / Disable the OpenTelemetry instrumentation. With this configuration option you can enable (default) or disable the built-in OpenTelemetry instrumentation.OTEL_PYTHON_INSTRUMENTATION_ELASTICSEARCH_CAPTURE_SEARCH_QUERY(default: omit) Capture search request bodies. Per default, the built-in OpenTelemetry instrumentation does not capture request bodies due to data privacy considerations. You can use this option to enable capturing of search queries from the request bodies of Elasticsearch search requests in case you wish to gather this information regardless. The options are to capture the raw search query, sanitize the query with a default list of sensitive keys, or not capture it at all. Valid Options:omit,sanitize,rawOTEL_PYTHON_INSTRUMENTATION_ELASTICSEARCH_SEARCH_QUERY_SANITIZE_KEYS(default: None) Sanitize the Elasticsearch search request body You can configure the list of keys whose values are redacted when the search query is captured. Values must be comma-separated.
Testing
We will have an "OpenTelemetry" test mode that will run the whole test suite with OpenTelemetry enabled and will test a simple API to make sure the exported spans are correct.
The InMemorySpanExporter allows end-to-end testing.
Endpoint id and path parts
Those will need to be passed to the transport.
Steps
- [x] https://github.com/elastic/elastic-transport-python/pull/150
- [x] https://github.com/elastic/elastic-transport-python/pull/151 and https://github.com/elastic/elasticsearch-py/pull/2457
- [x] #2466
- [x] https://github.com/elastic/elastic-transport-python/pull/155
- [ ] https://github.com/elastic/elastic-transport-python/pull/156 and https://github.com/elastic/elasticsearch-py/pull/2482
- [x] https://github.com/elastic/elasticsearch-py/pull/2479
- [ ] Add docs page and default
OTEL_PYTHON_INSTRUMENTATION_ELASTICSEARCH_ENABLEDto True - [ ] Sanitization
Supported attributes
Required
- [x] db.system
- [x] db.elasticsearch.path_parts
- [x] db.operation
- [x] http.request.method
- [x] url.full
Recommended
- [x] db.elasticsearch.cluster.name
- [x] db.elasticsearch.node.name
- [x] db.statement
- [x] server.address
- [x] server.port