opentelemetry-js icon indicating copy to clipboard operation
opentelemetry-js copied to clipboard

feat(exporter-metrics-otlp-proto)!: rewrite exporter

Open pichlermarc opened this issue 1 year ago • 4 comments

NOTE: Draft PR, do not review yet.

Which problem is this PR solving?

It is currently very hard to test individual features of the OTLP exporters. This is due to the following issues:

  • configuration is lumped in with other functionality
    • this makes it very hard to test configuration changes without elaborate testing setups, and therefore hard to respond to bug issues in a reasonable time frame
  • transport code is not split from the rest (#4116)
    • this makes it very hard to test new features in isolation, and makes it very hard to add new transports (such as fetch, or others)
  • browser features are mixed in with Node.js specific features
    • changing one thing to may affect another, where this is not desired

This PR proposes a new structure for all OTLP exporters, starting with @opentelemetry/exporter-metrics-otlp-proto. It splits it into the following key parts by introducing interfaces for them, which allows us to mock any of these for testing:

  • IExporterTransport (#4116)
    • transport code based on http (nodejs), grpc (nodejs), xhr (Browser), or sendBeacon (Browser)
  • ISerializer
    • serialization code for json or protobuf (can be re-used with grpc)
    • having an interface that does (object in -> raw data out) allows us to make a transport that's completely agnostic of what it's sending.
  • ITransformer
    • transform internal signal representation to serializable signal representation that can then be re-used with all 3 transport variants, we may be able to move the implementations the @opentelemetry/otlp-transformer package.
  • (TODO: not sure about this one) IExportPromiseQueue
    • a queue for export promises to await on shutdown that was previously part of the exporter
  • IResponseHandler
    • a way to handle responses from the export request #3183

It further split out configuration-code into separate files that handle defaults-merging and environment parsing, and allows us to leave out references to environment configuration from exporters intended for the browser. This aids with testability as we now can test environment parsing separately from default-merging. Partially addresses #3193, where the wrong environment variable was used for the compression value (the trace setting was used over metrics).

It also introduces a factory function for exporters in each exporter package, that's intended to replace the OLTP*Exporter exporter classes. To keep backwards compatibility, the OTLP*Exporter classes are still retained and they translate the "old" configuration to the "new" one.

Reasoning for some design decisions:

  • introducing and splitting @opentelemetry/otlp-http-exporter-node-base @opentelemetry/otlp-http-exporter-browser-base packages:

    • Browser and Node transports work very differently, passing the same parameters to the constructor/factory function is confusing and a lot of it is unused. Having the same package means that we have to export the same types. Having separate ones allow us to differentiate.
    • both of these were part of @opentelemetry/otlp-exporter-base
    • @opentelemetry/otlp-grpc-exporter-base existed already, finding the HTTP-related code was not easy.
  • introducing the @opentelemetry/otlp-metrics-exporter-base package:

    • we should not duplicate features across the different exporters as they should work the same and stay in sync; examples:
      • defaults
      • environment variable parsing
      • temporality selectors
  • keeping legacy exporter class vs factory function approach:

    • we've had them for a very long time (longer than I can remember) and we need to keep some backwards compatibility as it is the most commonly used interface.
    • I removed all members that should not be accessed anyway (this is a breaking change)
      • this is required due to the splitting of configuration from the exporter.
      • using them would often not re-configure the exporter, therefore I think removing them prevents wrong use of these fields (url, headers, etc.)
  • retiring everything that's not generated protobuf from the proto-exporter-base package.

    • I'm not sure about this, we could also keep it around
    • my intention was to clearly seperate metrics/logs and traces, but the pre-generated code has metrics/logs/traces mixed up together in one file anyway
  • adding a browser exporter for @opentelemetry/exporter-metrics-otlp-proto (#4098):

    • we did not have one before (trace and logs both have one)
    • this is to prove that the new design works in the Browser without needing to change an existing browser exporter
  • using factory functions and interfaces over exposing the actual types:

    • we've had problems before with types becoming incompatible even when private fields were added
    • using interfaces and factory functions solves that issue partially, as we don't expose the full class.
    • The contract with our users is also different when we do it this way - we only guarantee the minimum set of functionality on the exporter, no accidentally exposing things like url, headers or other things that can be taken as a signal that these can/should be modified on runtime.

Open questions before this PR is ready for review:

  • This is a large change and aims to be backward compatible.
    • Target main, or should we target next?
  • Is the general structure (interfaces, the way exporters are constructed, etc.) fine?
    • Answers don't need to go into details, these we can leave for the actual PRs.
    • Is there anything you're missing from the interfaces that should be there?
  • Is the approach to move pre-existing code into the legacy directory first fine?
    • This is done to ensure that it's easy to remove the old code once all exporters are migrated

Type of change

Please delete options that are not relevant.

  • [x] New feature (non-breaking change which adds functionality)
  • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • [x] Unit Tests
  • [x] Manually testing against the OTel Collector / OTLP Metrics APIs directly

Checklist:

  • [ ] Followed the style guidelines of this project
  • [ ] Unit tests have been added
  • [ ] Documentation has been updated

pichlermarc avatar Jan 12 '24 15:01 pichlermarc

Codecov Report

Attention: Patch coverage is 78.27381% with 73 lines in your changes missing coverage. Please review.

Project coverage is 91.65%. Comparing base (4655895) to head (e9f90ca). Report is 177 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4415      +/-   ##
==========================================
- Coverage   92.23%   91.65%   -0.59%     
==========================================
  Files         336      350      +14     
  Lines        9533     9853     +320     
  Branches     2022     2096      +74     
==========================================
+ Hits         8793     9031     +238     
- Misses        740      822      +82     
Files Coverage Δ
...ics-otlp-proto/src/internal/metrics-transformer.ts 100.00% <100.00%> (ø)
...p-exporter-base/src/common/otlp-export-delegate.ts 100.00% <100.00%> (ø)
...tlp-exporter-base/src/common/retrying-transport.ts 100.00% <100.00%> (ø)
.../otlp-exporter-base/src/legacy/OTLPExporterBase.ts 95.23% <ø> (ø)
...al/packages/otlp-exporter-base/src/legacy/index.ts 100.00% <100.00%> (ø)
...legacy/platform/browser/OTLPExporterBrowserBase.ts 17.14% <ø> (ø)
...exporter-base/src/legacy/platform/browser/index.ts 100.00% <ø> (ø)
...-exporter-base/src/legacy/platform/browser/util.ts 34.84% <ø> (ø)
...es/otlp-exporter-base/src/legacy/platform/index.ts 100.00% <ø> (ø)
...e/src/legacy/platform/node/OTLPExporterNodeBase.ts 50.00% <ø> (ø)
... and 18 more

... and 13 files with indirect coverage changes

codecov[bot] avatar Jan 12 '24 15:01 codecov[bot]

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar May 06 '24 06:05 github-actions[bot]

removing stale as this is still active as a reference for future PRs, will close once all changes are merged.

pichlermarc avatar May 07 '24 10:05 pichlermarc

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Jul 22 '24 06:07 github-actions[bot]