community [Proposal] Automate reference documentation as YAML files

I originally raised this proposal in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24189, but after discussing this with @svrnm I think it'd apply to the wider OTel community:

Reference documentation for each component is hard to come by, update, and produce. Settings and metrics are, by far, the user-facing elements that might change more often between releases. This poses significant overhead on anyone trying to keep documentation up-to-date both upstream and downstream.

An example of overhead is https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/4981, where I added settings docs manually after scavenging for settings using grep. Clearly not ideal if we want to keep documentation up to date.

A solution to this update overhead could be to automate the generation of reference docs into structured data files, so that those files can be parsed and rendered into documentation, both upstream and downstream, effectively becoming a reliable single source of truth for OTel settings, metrics, etc. By reference docs I mean the following:

Metrics, for example Collector receivers's OOTB metrics
Configuration settings
Component metadata (maturity, distros, etc.)

I believe the Collector repo is already using mdatagen for some things. I've also seen a spike in YAML usage for docs generation in other repos, like the Registry's.

Jul 26 '23 08:07 theletterf

@open-telemetry/docs-approvers please take a look as well.

I think what @theletterf is looking for is similar to what we want to accomplish with the registry eventually. We had multiple discussions in this direction in the past, here are a few of the most recent from @punya and @TylerHelmuth:

https://github.com/open-telemetry/opentelemetry.io/issues/2441

As an end-user looking for instrumentation for my chosen language/library/tool, I want to quickly see a sample of the telemetry I'll get by adopting an OTel instrumentation library, so that I can evaluate whether it'll serve my obervability needs.
https://github.com/open-telemetry/opentelemetry.io/issues/2614

Lots of AutoInstrumentation libraries are providing metrics with traces. Currently, it is very difficult to find the list of metrics generated by the auto instrumentation library, and how to enable them. Few auto instrumentation libraries are providing a brief description, but the end-user has to browse each folder to get the information.

We also had a general discussion on filling the registry automatically a few months back (https://github.com/open-telemetry/opentelemetry.io/issues/1855). As @austinlparker stated "this has been a recurring idea for almost five years now (it was suggested for the OpenTracing registry, which was the precursor to the OpenTelemetry one)." What we have as of today is a semi-automatic mechanism (https://github.com/open-telemetry/opentelemetry.io/tree/main/scripts/registry-scanner) that we run from time to time to capture new components, but it still requires a lot of manual intervention and it only provides basic details on a component (name, title, some tags, etc.)

I am a big fan of having "a reliable single source of truth for OTel settings, metrics, etc.", which can be sourced by opentelemetry.io or any other consumer (vendor docs, etc.), but it is a really hard task to tackle: there is the technological problem (read this for details) and a people problem, as this would require a few individuals to invest time & brain power into making this happen and then keep the maintanance up. So as a first step I think there should be a few individuals who want to rally around this problem and look into it more deeply.

Jul 26 '23 13:07 svrnm

@svrnm I'd definitely love to help, if anything from a requirements/testing perspective. Others at Splunk might also want to contribute (@pmcollins @atoulme).

Jul 26 '23 13:07 theletterf

The one point of clarification I'd like to add to this is that we shouldn't try to create a system for components that already have good automated reference documentation, namely language SDKs that emit this information as a part of a build/release process.

Jul 26 '23 16:07 cartermp

@cartermp Could I see an example of that? The idea is that the automated reference documentation is provided as YAML, not as Markdown. Markdown docs are hard to collect and embed, while YAML can be easily processed for a variety of channels, including downstream.

Jul 26 '23 16:07 theletterf

This is one example: https://pkg.go.dev/go.opentelemetry.io/otel/sdk/trace

We intentionally deferred to language SDKs to emit their reference documentation using language-native toolsets, since those often have the best support for things. I wouldn't support trying to force a yaml-based system on them, especially since it would likely imply them having to come up with some manual or psuedo-manual process to keep them up to date rather than relying on the well-tested systems built for their language already.

Jul 26 '23 17:07 cartermp

@cartermp That's fine. The scope of what I refer to is more restricted and isn't concerned with developer / manual instrumentation reference docs. Examples of things I'm referring to:

Settings: https://github.com/splunk/collector-config-tools/blob/main/cfg-metadata/receiver/mongodb.yaml
Metrics: https://github.com/splunk/collector-config-tools/blob/main/metric-metadata/mongodbatlasreceiver.yaml

We're currently generating those from the code. We could go ahead and continue producing these downstream, I guess, but I'd much rather have the whole community benefit from data that could be easily turned into docs anywhere.

Jul 27 '23 13:07 theletterf

community community copied to clipboard

[Proposal] Automate reference documentation as YAML files

community
community copied to clipboard