opentelemetry-python
opentelemetry-python copied to clipboard
Add exemplars feature
Description
Fixes https://github.com/open-telemetry/opentelemetry-python/issues/2407
API changes
opentelemetry.metrics.instrument: Add optionalContextargument to all record/add/set value[^1]opentelemetry.metrics.Observation: Add optionalContextattribute- Add
Exemplar,ExemplarFilters andExemplarReservoirs following the spec and the implementation in JS and Java SDK opentelemety.sdk.metrics.MeterProvider: Add optionalexemplar_filterattribute to the constructor
The filter will be stored in theSdkConfiguration(via a new attributeexemplar_filter)opentelemety.sdk.metrics._internal.Measurement: Add mandatorytime_unix_nanoandContextattributesopentelemetry.sdk.metrics._internal.*DataPoint: Add optionalexemplarsattributeopentelemetry.sdk.metrics._internal._Aggregation:- Add an attribute
_reservoirset from areservoir_factory - Add method
_collect_exemplarthat is called by the children classes oncollect - Make
aggregatenot abstracted any longer and add an optionalshould_sample_exemplarargument. If it is True (default), the exemplar reservoir will be offer toMeasurementto store an exemplar. That method need to be called viasuperon all children classes
- Add an attribute
opentelemetry.sdk.metrics.view.View: Add optional attributeexemplar_reservoir_factorydefining the exemplar reservoir factory per_Aggregationtype.- Add
should_sample_exemplarargument on methodconsume_measurementfromMeasurementConsumer,MetricReaderStorageand_ViewInstrumentMatch
Notes
- The filter must be configurable in
MetricProvider, hence I went for propagating theshould_samplevalue along side the measurements. This is the approach chosen in the .NET SDK. Another possibility would be to mimic the Java SDK approach that is wrapping the wanted exemplar reservoir in an exemplar reservoir applying first the filter (although I don't know right now how to implement that variant). - The reservoir factory provided to
Viewis based on_Aggregationtype. I find that non-optimal as those classes are meant to be protected when the factory is meant to be public. But I don't see an alternative. - The reservoir factory per
_Aggregationtype returns a(**kwargs): ExemplarReservoirfactory because some reservoirs need to be configured based on the aggregation parameters; e.g. the exponential histogram bucket size will impact the number of buckets in the exemplar reservoir.
Type of change
Please delete options that are not relevant.
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [x] This change requires a documentation update
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
- [ ] Test A
Does This PR Require a Contrib Repo Change?
- [ ] Yes. - Link to PR:
- [ ] No.
I guess it may impact exporter overthere??
Checklist:
- [ ] Followed the style guidelines of this project
- [ ] Changelogs have been updated
- [ ] Unit tests have been added
- [ ] Documentation has been updated
[^1]: This is similar to the Java and Javascript SDK.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: fcollonval / name: Frédéric Collonval (c29e0dd62bf60d39562e35a93aa0d58b22081370, 682a1762b35594d18604456c5d6f7ce5246b6049, d699f8d91c7ef3d3c1647aefd795cba83ef66b5a, 4f5efa7758499b335fc5a6bbfadf61a779e90554, 3b1e40dc6228767d02efe2b6cc35742eacfda756, 6a25608ebef6bf3214ad8b0df64541a52d8b83cb, 05ebc34baed6716c550807e611aab6036e3e7adb, 5aa8353adef8cbeefd61c3a056bf90c28f628fcf, e8aa164791cbcf98b6e19ea98e6b91c09a9863b4, 612404e4f5f0f912c8fe71956ca01ae3d285aff6, 2b7793a61142ca897758ef728fabc021855b8097, 5fc775ba134af7f48a075911d8376d7d0bccdd39, 6f74b3c41a61173bac4c31eefd8e65f681f6839c, 19b2db47ddf5296d846778d5cbd4295ce273c162, 1309b6131cf9813d4a961c00a0b1c9de35ccc49c, 0a13b62112464023682766afc5d5b1c2abc4020f, 2780df70f90cfd28ebb2d9bce65832737303566d, 80f040d269580459214a4338125b12770b6ed882, f0ecace0407a2ea62d25196880cf8340271021c7, 22cebeb50bf819b8874d7ea065acdcf05f9ebcdc, dcb44f012a143bfc421bb48cd2d7f09d3bc4b79f, e2b77783afac3748c07cb91b36fbd1dfe9db14e2, 5b32ffcbf900b23765368e35f833eeccb75d9c69, c0787abf22b8d28abe8bf2df04defbd09ce9bc66, f6f62338d774e1035e3a446fd90d64eae212cafc, e6c1d0109d58b723a8e5c4085bae512191ec9543, 04a21e053c5795fba9ae90191ed0293d187d749a, 85595047173c13a7d16715519a50f35e0d705b84, 028e414f1c709f2e5923ef43ce803b2cc7a26670, e3fe1cbff5c8bc911ddce9c696a0cdee6cca7d17, 0ea80dce7c0368e4ba2d8ef05a193bc3581a8d22, 9935e633680ebc1f1ecec4e2439df3f660ba1cbd, 74016f04110f723bb500f2ac74ecf41e5a55b023, 627da870d2856c7550dd70f65da97ff6824bdae0, 975700a23e0927159a0f7b9ff8de7086534925a1, b5734f5277723cde8e83e06a7fb8d87e69a4d5a9, 77d109efe89e9bb52b5c2f81b6da037a797b7d5a)
- :white_check_mark: login: czhang771 / name: Catherine Zhang (68e8824a88de0bdd756df2db6929c2f242b37092, afd4e2c28292e1375a2c1fc9266a67e26ef01ac8, bfaec2da28042ac681649fcab53b1ccb384afdb4, 351730c80cb19426023211a8c8f50025653abe5e, eece48d6e7b8a61d2ec89e53299562e5ff262f5f, 70f8bef09310b44d2ce0f2d9ea38d288b483da17, fadcefcf1f5b08ecb0b365b72e6dff6a987d90ae, ed02f8bff921a35466b599d1b019bd8856d10d4d, ecd03bc4db28f59c74a6dc6d98aa835c2f205e30, e7e4227fb1ab3afd29280a34f0cb7b9e3ec01edc)
- :white_check_mark: login: lzchen / name: Leighton Chen (ceba9f1a9d539387e11ca1bc7d637c9a9f33f1e2, 22a28755e9cf4e154eb5e8b7353b55fbf0fe24b5, 592270a8bc5e3f5fbb53ab5c92496bdf8135a580)
- :white_check_mark: login: xrmx / name: Riccardo Magliocchetti (b1355f3d9d15af30ff5a448d4e6c7ac4b8d202a2)
Dear maintainers,
I would like to contribute to this project by adding the support for exemplars.
Currently, I have added the base classes (based on the JavaScript SDK implementation). And now I'm facing challenges to integrate this into the code base. Analyzing the C# and Java SDK, it seems that the _Aggregation class is the best place to save the ExemplarReservoir:
https://github.com/open-telemetry/opentelemetry-python/blob/51b7e4cacf099953e830a56ed2523c13c06aece2/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/aggregation.py#L82
But this raises 3 questions:
- How best to provide the
ExemplarFilterand reservoir factory to the constructor? - How to get the
attributesin theaggregatemethod - this is the required variable missing compared to Java? - How to get the
contextin thecollectmethod - this is the required variable missing compared to Java?
Any additional pointers would be appreciated.
@fcollonval
Thanks so much for picking this up. I am not too familiar with exemplars but happy to open up a discussion about the architectural design. A couple of questions/changes I have in mind:
-
The time the API call was made to record a Measurement.
This would probably entail adding a timestamp to Measurement to represent when the api call was made/when the Measurement was created.
-
The associated trace id and span id of the active Span within Context of the Measurement at API call time.
Most likely another field in Measurement to store Context information. This would most likely get populated at time of recording/Measurement creation. Hopefully this addresses your question 3.
- Where in the metrics pipeline do we envision the
ExemplarFilterandExemplarReservoirto "hook" onto (aka when shouldofferandcollectbe called?).
offer can probably be called when a Measurement is being "processed", so I'm envisioning a separate method in Aggregation called process_exemplar(Measurement, attributes), where attributes are the POST view-filtered attributes that actually show up in the time-series and measurement.attributes are the pre-view filtered attributes that were actually used to record. This probably addresses your question 2.
collect is most likely called when the Aggregation calls collect as well.
The "collect" method MUST return accumulated Exemplars.
I am not exactly sure what "accumulated Exemplars" means but from my basic understanding it seems like we just need to return the set of Exemplar s in memory that we have been accumulating.
- For question 1, I am not too sure what it is you are asking but this is my preliminary understanding of how the components should be constructed:
A new ExemplarReservoir MUST be created for every known timeseries data point, as determined by aggregation and view configuration.
There will most likely be a single ExemplarReservoir instance per Aggregation since each represent a known timeseries which is created when the Aggregation is created.
The ExemplarFilter SHOULD be a configuration parameter of a MeterProvider for an SDK.
Most likely passed down as a reference to the Aggregation.
How best to provide the ExemplarFilter and reservoir factory to the constructor?
Is there a need for a factory? What are you envisioning here?
Feel free to add any insights or correct any misunderstandings I may have :)
Thanks a lot @lzchen for all the information.
Thanks for pointing out about the Measurement it clarifies how I can provide the missing info the ExemplarReservoir.
The "collect" method MUST return accumulated Exemplars.
I am not exactly sure what "accumulated Exemplars" means but from my basic understanding it seems like we just need to return the set of Exemplar s in memory that we have been accumulating.
This is also my understanding. The reservoir will provide what it collected.
A new ExemplarReservoir MUST be created for every known timeseries data point, as determined by aggregation and view configuration.
There will most likely be a single ExemplarReservoir instance per Aggregation since each represent a known timeseries which is created when the Aggregation is created.
:+1:
The ExemplarFilter SHOULD be a configuration parameter of a MeterProvider for an SDK.
Most likely passed down as a reference to the Aggregation.
:+1: I missed that point in the spec
reservoir factory to the constructor?
Is there a need for a factory? What are you envisioning here?
From https://github.com/open-telemetry/opentelemetry-specification/blob/5381b55dd8e6adcbf99e153533e5ad8ea3dd6b38/specification/metrics/sdk.md?plain=1#L387
The SDK MUST accept the following stream configuration parameters:
exemplar_reservoir: A functional type that generates an exemplar reservoir aMeterProviderwill use when storing exemplars. This functional type needs to be a factory or callback similar to aggregation selection functionality which allows different reservoirs to be chosen by the aggregation.
This is implemented in the Java SDK - and I guess it is required to support:
Custom ExemplarReservoir The SDK MUST provide a mechanism for SDK users to provide their own ExemplarReservoir implementation.
Thanks a lot for the pointers, I'll work towards a first working version.
Hi there! I have also started looking into adding exemplars as well and would love to chat with you about it (I've reached out on linkedin).
@catherine-m-zhang you should have receive an invitation to my fork. Could you accept it so I can grant you enough rights to push on the branch directly?
I updated the PR description with the changes.
Here is a status of the work as far as I can tell:
- [x] CODE Implement base classes
- [x] CODE Connect the dots with the existing code
- [x] TEST Fix existing tests and lint the code (some tests may not be fixed yet as I only checked the ones in
opentelemetry-sdk/tests/metrics - [x] TEST Add unit tests for the new classes. I started for the filters as an exemple in
opentelemetry-sdk/tests/metrics/test_exemplarfilter.py. You can take inspiration from the JS SDK. - [x] TEST Add more unit tests for the new API in particular for all
_Aggregationclasses, for the reservoir factory that is customizable inView. - [x] CODE The exemplar attributes should be filtered to the ones not included in the metric data point (see spec).
- [ ] CODE Update the exporters to export the exemplars - I did not look at all if this will work out of the box or not.
- [x] DOC Add documentation/examples for the feature
I added an integration test that may be helpful in https://github.com/open-telemetry/opentelemetry-python/pull/4094/files#diff-46edae6a309b04be4e5143e01feb450ffc9f576c1b146bd288db31ed175c8982
@lzchen here is a first workish version - feel free to have a look especially at the API changes to see if they fit the current API logic.
I'm off for the next 15 days but Catherine offers to help on the subject.
@catherine-m-zhang
Is this pr ready for review? If that's the case, feel free to mark as a PR instead of draft.
@lzchen I don't think I have the correct access to mark this as ready for review but this is ready to be reviewed!
Could you add an example in samples displaying how someone would specify exemplar filter through MeterProvider and exemplar_reservoir_factory through Views configuration?
@fcollonval @catherine-m-zhang
Thanks a lot for working on this. I did a extensive first-pass review on the PR. Overall looks pretty good. Seeing as it is pretty large, feel free to address the comments individually or even split up the PR to smaller pieces and referencing the comments if you'd like more eyes on this and want to get this merged quicker. It'll make reviewing easier for others that are not as familiar with Exemplars.
@fcollonval @catherine-m-zhang
Thanks a lot for working on this. I did a extensive first-pass review on the PR. Overall looks pretty good. Seeing as it is pretty large, feel free to address the comments individually or even split up the PR to smaller pieces and referencing the comments if you'd like more eyes on this and want to get this merged quicker. It'll make reviewing easier for others that are not as familiar with Exemplars. @lzchen got it, currently working on individually addressing the comments! quick question, i have just enabled exporting exemplars for otlp and am working on prometheus. would a basic version of the PR be approved to merge with just the otlp exporter or is enabling the prometheus exporter a requirement.
Remaining tasks as of September 2nd:
- Update the exporters to export the exemplars - I did not look at all if this will work out of the box or not:
- [x] otlp
- [ ] others in core --> follow-up PRs
- [ ] contrib --> follow-up PRs
- [x] Support environment variables for exemplar
- [x] DOC write more doc and examples
- [x] TEST do another pass to see if we should add some additional tests
Thanks @czhang771 for your contributions
Thanks @lzchen and @emdneto for your reviews. I have some questions regarding your review. Could you have another look?
Dear maintainers, this PR is finalized (CI should be green :crossed_fingers:) and ready for final review.
Gentle ping to @lzchen @pmcollins @emdneto
@fcollonval what else is remaining to close on this PR?