opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

Add exemplars feature

Open fcollonval opened this issue 1 year ago • 12 comments

Description

Fixes https://github.com/open-telemetry/opentelemetry-python/issues/2407

API changes

  • opentelemetry.metrics.instrument: Add optional Context argument to all record/add/set value[^1]
  • opentelemetry.metrics.Observation: Add optional Context attribute
  • Add Exemplar, ExemplarFilters and ExemplarReservoirs following the spec and the implementation in JS and Java SDK
  • opentelemety.sdk.metrics.MeterProvider: Add optional exemplar_filter attribute to the constructor
    The filter will be stored in the SdkConfiguration (via a new attribute exemplar_filter)
  • opentelemety.sdk.metrics._internal.Measurement: Add mandatory time_unix_nano and Context attributes
  • opentelemetry.sdk.metrics._internal.*DataPoint: Add optional exemplars attribute
  • opentelemetry.sdk.metrics._internal._Aggregation:
    • Add an attribute _reservoir set from a reservoir_factory
    • Add method _collect_exemplar that is called by the children classes on collect
    • Make aggregate not abstracted any longer and add an optional should_sample_exemplar argument. If it is True (default), the exemplar reservoir will be offer to Measurement to store an exemplar. That method need to be called via super on all children classes
  • opentelemetry.sdk.metrics.view.View: Add optional attribute exemplar_reservoir_factory defining the exemplar reservoir factory per _Aggregation type.
  • Add should_sample_exemplar argument on method consume_measurement from MeasurementConsumer, MetricReaderStorage and _ViewInstrumentMatch

Notes

  • The filter must be configurable in MetricProvider, hence I went for propagating the should_sample value along side the measurements. This is the approach chosen in the .NET SDK. Another possibility would be to mimic the Java SDK approach that is wrapping the wanted exemplar reservoir in an exemplar reservoir applying first the filter (although I don't know right now how to implement that variant).
  • The reservoir factory provided to View is based on _Aggregation type. I find that non-optimal as those classes are meant to be protected when the factory is meant to be public. But I don't see an alternative.
  • The reservoir factory per _Aggregation type returns a (**kwargs): ExemplarReservoir factory because some reservoirs need to be configured based on the aggregation parameters; e.g. the exponential histogram bucket size will impact the number of buckets in the exemplar reservoir.

Type of change

Please delete options that are not relevant.

  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [x] This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • [ ] Test A

Does This PR Require a Contrib Repo Change?

  • [ ] Yes. - Link to PR:
  • [ ] No.

I guess it may impact exporter overthere??

Checklist:

  • [ ] Followed the style guidelines of this project
  • [ ] Changelogs have been updated
  • [ ] Unit tests have been added
  • [ ] Documentation has been updated

[^1]: This is similar to the Java and Javascript SDK.

fcollonval avatar Jul 31 '24 11:07 fcollonval

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: fcollonval / name: Frédéric Collonval (c29e0dd62bf60d39562e35a93aa0d58b22081370, 682a1762b35594d18604456c5d6f7ce5246b6049, d699f8d91c7ef3d3c1647aefd795cba83ef66b5a, 4f5efa7758499b335fc5a6bbfadf61a779e90554, 3b1e40dc6228767d02efe2b6cc35742eacfda756, 6a25608ebef6bf3214ad8b0df64541a52d8b83cb, 05ebc34baed6716c550807e611aab6036e3e7adb, 5aa8353adef8cbeefd61c3a056bf90c28f628fcf, e8aa164791cbcf98b6e19ea98e6b91c09a9863b4, 612404e4f5f0f912c8fe71956ca01ae3d285aff6, 2b7793a61142ca897758ef728fabc021855b8097, 5fc775ba134af7f48a075911d8376d7d0bccdd39, 6f74b3c41a61173bac4c31eefd8e65f681f6839c, 19b2db47ddf5296d846778d5cbd4295ce273c162, 1309b6131cf9813d4a961c00a0b1c9de35ccc49c, 0a13b62112464023682766afc5d5b1c2abc4020f, 2780df70f90cfd28ebb2d9bce65832737303566d, 80f040d269580459214a4338125b12770b6ed882, f0ecace0407a2ea62d25196880cf8340271021c7, 22cebeb50bf819b8874d7ea065acdcf05f9ebcdc, dcb44f012a143bfc421bb48cd2d7f09d3bc4b79f, e2b77783afac3748c07cb91b36fbd1dfe9db14e2, 5b32ffcbf900b23765368e35f833eeccb75d9c69, c0787abf22b8d28abe8bf2df04defbd09ce9bc66, f6f62338d774e1035e3a446fd90d64eae212cafc, e6c1d0109d58b723a8e5c4085bae512191ec9543, 04a21e053c5795fba9ae90191ed0293d187d749a, 85595047173c13a7d16715519a50f35e0d705b84, 028e414f1c709f2e5923ef43ce803b2cc7a26670, e3fe1cbff5c8bc911ddce9c696a0cdee6cca7d17, 0ea80dce7c0368e4ba2d8ef05a193bc3581a8d22, 9935e633680ebc1f1ecec4e2439df3f660ba1cbd, 74016f04110f723bb500f2ac74ecf41e5a55b023, 627da870d2856c7550dd70f65da97ff6824bdae0, 975700a23e0927159a0f7b9ff8de7086534925a1, b5734f5277723cde8e83e06a7fb8d87e69a4d5a9, 77d109efe89e9bb52b5c2f81b6da037a797b7d5a)
  • :white_check_mark: login: czhang771 / name: Catherine Zhang (68e8824a88de0bdd756df2db6929c2f242b37092, afd4e2c28292e1375a2c1fc9266a67e26ef01ac8, bfaec2da28042ac681649fcab53b1ccb384afdb4, 351730c80cb19426023211a8c8f50025653abe5e, eece48d6e7b8a61d2ec89e53299562e5ff262f5f, 70f8bef09310b44d2ce0f2d9ea38d288b483da17, fadcefcf1f5b08ecb0b365b72e6dff6a987d90ae, ed02f8bff921a35466b599d1b019bd8856d10d4d, ecd03bc4db28f59c74a6dc6d98aa835c2f205e30, e7e4227fb1ab3afd29280a34f0cb7b9e3ec01edc)
  • :white_check_mark: login: lzchen / name: Leighton Chen (ceba9f1a9d539387e11ca1bc7d637c9a9f33f1e2, 22a28755e9cf4e154eb5e8b7353b55fbf0fe24b5, 592270a8bc5e3f5fbb53ab5c92496bdf8135a580)
  • :white_check_mark: login: xrmx / name: Riccardo Magliocchetti (b1355f3d9d15af30ff5a448d4e6c7ac4b8d202a2)

Dear maintainers,

I would like to contribute to this project by adding the support for exemplars.

Currently, I have added the base classes (based on the JavaScript SDK implementation). And now I'm facing challenges to integrate this into the code base. Analyzing the C# and Java SDK, it seems that the _Aggregation class is the best place to save the ExemplarReservoir:

https://github.com/open-telemetry/opentelemetry-python/blob/51b7e4cacf099953e830a56ed2523c13c06aece2/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/aggregation.py#L82

But this raises 3 questions:

  • How best to provide the ExemplarFilter and reservoir factory to the constructor?
  • How to get the attributes in the aggregate method - this is the required variable missing compared to Java?
  • How to get the context in the collect method - this is the required variable missing compared to Java?

Any additional pointers would be appreciated.

fcollonval avatar Jul 31 '24 11:07 fcollonval

@fcollonval

Thanks so much for picking this up. I am not too familiar with exemplars but happy to open up a discussion about the architectural design. A couple of questions/changes I have in mind:

  1. The time the API call was made to record a Measurement.

This would probably entail adding a timestamp to Measurement to represent when the api call was made/when the Measurement was created.

  1. The associated trace id and span id of the active Span within Context of the Measurement at API call time.

Most likely another field in Measurement to store Context information. This would most likely get populated at time of recording/Measurement creation. Hopefully this addresses your question 3.

  1. Where in the metrics pipeline do we envision the ExemplarFilter and ExemplarReservoir to "hook" onto (aka when should offer and collect be called?).

offer can probably be called when a Measurement is being "processed", so I'm envisioning a separate method in Aggregation called process_exemplar(Measurement, attributes), where attributes are the POST view-filtered attributes that actually show up in the time-series and measurement.attributes are the pre-view filtered attributes that were actually used to record. This probably addresses your question 2.

collect is most likely called when the Aggregation calls collect as well.

The "collect" method MUST return accumulated Exemplars.

I am not exactly sure what "accumulated Exemplars" means but from my basic understanding it seems like we just need to return the set of Exemplar s in memory that we have been accumulating.

  1. For question 1, I am not too sure what it is you are asking but this is my preliminary understanding of how the components should be constructed:

A new ExemplarReservoir MUST be created for every known timeseries data point, as determined by aggregation and view configuration.

There will most likely be a single ExemplarReservoir instance per Aggregation since each represent a known timeseries which is created when the Aggregation is created.

The ExemplarFilter SHOULD be a configuration parameter of a MeterProvider for an SDK.

Most likely passed down as a reference to the Aggregation.

How best to provide the ExemplarFilter and reservoir factory to the constructor?

Is there a need for a factory? What are you envisioning here?

Feel free to add any insights or correct any misunderstandings I may have :)

lzchen avatar Aug 01 '24 19:08 lzchen

Thanks a lot @lzchen for all the information.

Thanks for pointing out about the Measurement it clarifies how I can provide the missing info the ExemplarReservoir.

The "collect" method MUST return accumulated Exemplars.

I am not exactly sure what "accumulated Exemplars" means but from my basic understanding it seems like we just need to return the set of Exemplar s in memory that we have been accumulating.

This is also my understanding. The reservoir will provide what it collected.

A new ExemplarReservoir MUST be created for every known timeseries data point, as determined by aggregation and view configuration.

There will most likely be a single ExemplarReservoir instance per Aggregation since each represent a known timeseries which is created when the Aggregation is created.

:+1:

The ExemplarFilter SHOULD be a configuration parameter of a MeterProvider for an SDK.

Most likely passed down as a reference to the Aggregation.

:+1: I missed that point in the spec

reservoir factory to the constructor?

Is there a need for a factory? What are you envisioning here?

From https://github.com/open-telemetry/opentelemetry-specification/blob/5381b55dd8e6adcbf99e153533e5ad8ea3dd6b38/specification/metrics/sdk.md?plain=1#L387

The SDK MUST accept the following stream configuration parameters:

  • exemplar_reservoir: A functional type that generates an exemplar reservoir a MeterProvider will use when storing exemplars. This functional type needs to be a factory or callback similar to aggregation selection functionality which allows different reservoirs to be chosen by the aggregation.

This is implemented in the Java SDK - and I guess it is required to support:

Custom ExemplarReservoir The SDK MUST provide a mechanism for SDK users to provide their own ExemplarReservoir implementation.

Thanks a lot for the pointers, I'll work towards a first working version.

fcollonval avatar Aug 02 '24 09:08 fcollonval

Hi there! I have also started looking into adding exemplars as well and would love to chat with you about it (I've reached out on linkedin).

catherine-m-zhang avatar Aug 09 '24 21:08 catherine-m-zhang

@catherine-m-zhang you should have receive an invitation to my fork. Could you accept it so I can grant you enough rights to push on the branch directly?

I updated the PR description with the changes.

Here is a status of the work as far as I can tell:

  • [x] CODE Implement base classes
  • [x] CODE Connect the dots with the existing code
  • [x] TEST Fix existing tests and lint the code (some tests may not be fixed yet as I only checked the ones in opentelemetry-sdk/tests/metrics
  • [x] TEST Add unit tests for the new classes. I started for the filters as an exemple in opentelemetry-sdk/tests/metrics/test_exemplarfilter.py. You can take inspiration from the JS SDK.
  • [x] TEST Add more unit tests for the new API in particular for all _Aggregation classes, for the reservoir factory that is customizable in View.
  • [x] CODE The exemplar attributes should be filtered to the ones not included in the metric data point (see spec).
  • [ ] CODE Update the exporters to export the exemplars - I did not look at all if this will work out of the box or not.
  • [x] DOC Add documentation/examples for the feature

I added an integration test that may be helpful in https://github.com/open-telemetry/opentelemetry-python/pull/4094/files#diff-46edae6a309b04be4e5143e01feb450ffc9f576c1b146bd288db31ed175c8982

fcollonval avatar Aug 14 '24 09:08 fcollonval

@lzchen here is a first workish version - feel free to have a look especially at the API changes to see if they fit the current API logic.

I'm off for the next 15 days but Catherine offers to help on the subject.

fcollonval avatar Aug 14 '24 09:08 fcollonval

@catherine-m-zhang

Is this pr ready for review? If that's the case, feel free to mark as a PR instead of draft.

lzchen avatar Aug 16 '24 21:08 lzchen

@lzchen I don't think I have the correct access to mark this as ready for review but this is ready to be reviewed!

czhang771 avatar Aug 23 '24 23:08 czhang771

Could you add an example in samples displaying how someone would specify exemplar filter through MeterProvider and exemplar_reservoir_factory through Views configuration?

lzchen avatar Aug 26 '24 21:08 lzchen

@fcollonval @catherine-m-zhang

Thanks a lot for working on this. I did a extensive first-pass review on the PR. Overall looks pretty good. Seeing as it is pretty large, feel free to address the comments individually or even split up the PR to smaller pieces and referencing the comments if you'd like more eyes on this and want to get this merged quicker. It'll make reviewing easier for others that are not as familiar with Exemplars.

lzchen avatar Aug 27 '24 19:08 lzchen

@fcollonval @catherine-m-zhang

Thanks a lot for working on this. I did a extensive first-pass review on the PR. Overall looks pretty good. Seeing as it is pretty large, feel free to address the comments individually or even split up the PR to smaller pieces and referencing the comments if you'd like more eyes on this and want to get this merged quicker. It'll make reviewing easier for others that are not as familiar with Exemplars. @lzchen got it, currently working on individually addressing the comments! quick question, i have just enabled exporting exemplars for otlp and am working on prometheus. would a basic version of the PR be approved to merge with just the otlp exporter or is enabling the prometheus exporter a requirement.

czhang771 avatar Aug 29 '24 18:08 czhang771

Remaining tasks as of September 2nd:

  • Update the exporters to export the exemplars - I did not look at all if this will work out of the box or not:
    • [x] otlp
    • [ ] others in core --> follow-up PRs
    • [ ] contrib --> follow-up PRs
  • [x] Support environment variables for exemplar
  • [x] DOC write more doc and examples
  • [x] TEST do another pass to see if we should add some additional tests

fcollonval avatar Sep 02 '24 13:09 fcollonval

Thanks @czhang771 for your contributions

Thanks @lzchen and @emdneto for your reviews. I have some questions regarding your review. Could you have another look?

fcollonval avatar Sep 02 '24 13:09 fcollonval

Dear maintainers, this PR is finalized (CI should be green :crossed_fingers:) and ready for final review.

Gentle ping to @lzchen @pmcollins @emdneto

fcollonval avatar Sep 05 '24 08:09 fcollonval

@fcollonval what else is remaining to close on this PR?

splusq avatar Sep 12 '24 19:09 splusq