opentelemetry-js
opentelemetry-js copied to clipboard
How to make metadata accessible
I am exploring ways to enhance our handling of metadata for instrumentations, aiming to streamline processes and boost efficiency.
Instrumentation (or OpenTelemetry component) metadata comprises static information about OpenTelemetry JS instrumentation (or other components) that is valuable for distributions, control planes, APMs, and similar tools.
We currently record the name and version for each instrumentation, which also serves as the scope name for the signals we emit
Although metadata is not recorded into signals, it can significantly enhance user experience and automate tasks when utilized by distributions, offering a smoother and more intuitive interface.
Metadata Examples
- instrumentation description - this text is currently found only in package.json. It provides a concise, user-facing description that includes the instrumented packages and OpenTelemetry context. It was aligned across the codebase to have consistent and meaningful content in #4715 and https://github.com/open-telemetry/opentelemetry-js-contrib/pull/2202. Example text: "OpenTelemetry instrumentation for the
amqplibmessaging client for RabbitMQ" - Instrumented packages and supported version range - this text is currently only found in the README.md of each instrumentation. https://github.com/open-telemetry/opentelemetry-js-contrib/pull/2196 is an attempt to align it across the codebase. The instrumented packages is the user-facing package name, which can defer from the "patched packages" which
init()returns. The instrumented package is the most user friendly name to show in documentation and UIs thus it is quite useful IMO. - github repository - of where the code can be found (
"open-telemetry/opentelemetry-js-contrib","open-telemetry/opentelemetry-js", or third party repos). It is currently found in thepackage.jsonfor each instrumentation. - github path - the path inside the github repository where the code can be found. For example -
plugins/node/instrumentation-amqplib. This info can potentially be extracted from the"homepage"attribute inpackage.json. - stability status
- semantic conventions version implementation
- emitted signals
and more info that we might need oneday...
Essentially, any information that might be useful for users to consume through various interfaces (documentation, README, UI, links, status) in its raw format
Usages
Here are a few practical applications of how this metadata can be effectively utilized:
- distributions tools, to create automatic
READMEs, docs, and any markdown file, where the content is auto generated based on this data. See auto-instrumentations-node README. The instrumentations list can be auto-generated, and include more info to the user, like the instrumentation description, instrumented package names and supported versions, as well as a link to the homepage. This can enhance the user experience of our contrib distribution users, which can also be leveraged by other third party distributions. Auto-generated text reduce mistakes, maintenance, promote consistent content and is less prone to get out of sync. - OpenTelemetry control planes - If an OpenTelemetry control plane displays information about the components at runtime (via UI, files, or databases), details like the instrumented package can be useful for user-facing interfaces.
- Enhancements for UIs - providing enriched information about instrumentation can significantly improve the user experience when interacting with these details
Suggestion
I want to suggest aggregating the metadata to achieve the goals above. I can work on the relevant PRs to implement something if there is an agreement. I will start with just the info we already have available, and then introduce a script for the auto-instrumentations-node README auto-generation and enhancement. Additionally, I plan to utilize this data for the odigos distribution of js agent to auto-generate a Node.js section in the Odigos documentation and potentially report back instrumentation statuses to the Odigos control plane based on this data.
Some objectives to consider:
- bundle size for web packages
- programatic API to access the data, which does not include parsing markdown, heuristics on naming or exception tables.
- auto generate it when possible, see https://github.com/open-telemetry/opentelemetry-js-contrib/pull/2203
- making sure we are typescript-friendly for future additions and changes to this interface
- nice to have: all the data in a single interface
- nice to have: make the information available at runtime from the instrumentation class.
Options
- The simplest and straight forward way would be to add this data to instrumentation interface, and then have each instrumentation setting it up:
- as constructor argument, similar to instrumentation name and version which are already passed this way
- as a function that instrumentation can override and return a metadata object, like the current
init()function for patched packages info. - by defining an optional property from the base class which will expose this data on instrumentation instances.
If we decide to proceed this way, we must address TypeScript compatibility issues across versions to ensure that adding new properties does not introduce complexity.
Consider omitting it from web components at the moment so not to increase bundle size.
- save this data as a json file for each package, and publish it to npm alongside the instrumentations. Then tools can maybe pick the
node_modulesfolder to extract this info from code, and remote users can git pull to the tag or make an http request to fetch the data when needed. See Collector metadata.yaml as an inspiration.
Considerations
- many of these fields can be auto-generated and are not a burden to the implementations (github repo, github path, description)
- some of the data is already available in the README and can be documented into a json file where it can be consumed easily.
- It makes sense to me that if we already record such data, we might want to make sure it can now or one day, potentially be uses for other components like detectors, propagators, processors, samplers, etc.
I think that once we come up with a good way to record this info, introducing it to existing components is a relatively simple technical task which I am up for doing.
I would appreciate your thoughts, concerns, suggestions or support, to help make this initiative a success!