opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
New component: Solarwinds APM settings extension
The purpose and use-cases of the new component
The Solarwinds APM settings extension component gets Solarwinds APM specific settings from Solarwinds APM proprietary collector and outputs a solarwinds-apm-settings.json file and a solarwinds-apm-settings-raw file to the temp folder periodically.
The use-case of this new component is for opentelemetry lambda extension layers (Python, Java, NodeJS, .Net) to read the file and perform head-based sampling at the opentelemetry lambda extension layer.
Example configuration for the component
extensions:
solarwindsapmsettings:
endpoint: "apm.collector.cloud.solarwinds.com:443"
key: "<token>:<name>"
interval: 1m
endpoint (Required)
The endpoint which this extension calls getSettings.
Default: apm.collector.cloud.solarwinds.com:443
key (Required)
The key in format <token>:<name> for getSettings from Solarwinds APM collector.
interval (Optional)
Periodic interval to get Solarwinds APM specific settings from Solarwinds APM collector.
Default: 1m
Telemetry data types supported
N/A
Is this a vendor-specific component?
- [X] This is a vendor-specific component
- [x] If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.
Code Owner(s)
@jerrytfleung, @cheempz
Sponsor (optional)
@atoulme
Additional context
No response
I don't follow the use case, but I guess since it's vendor specific that's not a blocker. You need to identify a sponsor for your use case, please read CONTRIBUTING.md and make sure to attend a SIG meeting to present your proposal.
I don't follow the use case, but I guess since it's vendor specific that's not a blocker. You need to identify a sponsor for your use case, please read CONTRIBUTING.md and make sure to attend a SIG meeting to present your proposal.
Thanks for your reply. We will attend a SIG meeting.
Pinging @djaglowski as the next sponsor on the list
I will sponsor this time around as I was missing from the list.
Thank you!
Following up the question raised (which processor will read the remote config?) during SIG meeting, our python lambda layer (a custom distro) has implemented a sampler that calls our proprietary library API to get the head based sampling decision. The proprietary library read the output from this extension for the remote settings and calculate the head-based sampling decision.
Understood. Would it make sense then to make this component a processor rather than an extension? This way it can sit on the pipeline and perform the sampling. Alternatively, do you intend to propose changes to an existing sampling processor?
Above is a simplified version of the use case. The component will
getSettings periodically and output 2 files in tmp. The custom distro language lambda layer will read the file (periodically) and perform head-based sampling calculation. We need to perform that periodically because the remote settings from Solarwinds backend can change time by time (depending on customer's quota, metrics performance...)
Reasons not making this component a processor:
- This component doesn't directly process telemetry data
- Pipeline required a receiver and exporter, while this component task is to generate remote settings files periodically.
- The output file will be read by another process (language lambda layer) so the language lambda layer can perform head based sampling
We don't have plan to propose changes to an existing sampling processor yet.
I see, thanks. Let’s get it in.
please make a first PR according to guidelines in CONTRIBUTING.md with the skeleton of the extension.
I created the first PR according to the guidelines of the first PR.
A github action is complaining EasyCLA Missing CLA Authorization. I don't know what to do with it. Could you please help to advise? Thank you!
The comment says you need to click this link and follow instructions: https://api.easycla.lfx.linuxfoundation.org/v2/repository-provider/github/sign/9522512/196414933/30153/#/?version=2
You will need to review terms and have your company review the contribution terms to the CNCF, and follow up on steps to sign the CLA.
Hey @jerrytfleung please correct me if I'm wrong. Looking at the diagram you provided (thanks for this!), it looks to me that what the extension is designed to do is call an API and write some files to the disk. It doesn't look like the extension exposes any API to any other collector component or really do anything related to the collector that would justify it being implemented as an OpenTelemetry Collector extension.
Perhaps a better approach would be to have tool/binary/script outside of the OpenTelemetry Collector that does the job of calling the API and writing the files to disk.
Pinging @bryan-aguilar and @dmitryax who I think were raising similar questions during the SIG meeting discussion.
I agree with @astencel-sumo. Based on the diagram I do not see the requirement that this needs to exist within a collector lifecycle as an extension. Where does the requirement (a collector extension) originate from? Does the extension rely on the collector it is configured on in any way?
I believe I also gave feedback during the SIG meeting that this seems more appropriate for the https://github.com/open-telemetry/opentelemetry-lambda repository. As proposed, this extension would only be used within a lambda runtime environment and does not have a use case outside of that. The lambda repo has already created it's own components and is hosting them itself. Have you discussed with the FAAS(function as a service) community on whether or not they would accept your component within opentelemetry-lambda?
Previous diagram was a simplified version that didn't show much details on Solarwinds side.
Reasons that it needs to exist within a collector lifecycle
- The settings will be adjusted time by time according to factors. One of the major factor is the data points provided from the collector lambda layer in Solarwinds backends.
- They need to use the same secret to authenticate and we think it is better to keep it within the collector lambda layer and using the configuration like the following.
extensions:
solarwindsapmsettings:
endpoint: "<Solarwinds APM collector endpoint>"
key: ${env:KEY}
exporters:
otlp:
endpoint: "<Solarwinds Observability OTLP endpoint>"
headers:
Authorization: "Bearer ${env:KEY}"
- It exposes the Solarwinds settings via file.
The primary use case is a lambda runtime and now we don't have a use case outside of that. It is because the lambda instrumentation is the first project that shifting to use OTLP. Using open-telemetry collector outside of a lambda deployment is on our roadmap and having this extension in opentelemetry-collector-contrib will support our use cases for that.
I didn't discussed with FAAS community yet as I don't see any vendor specific components there. Also, I don't think a vendor specific components should be hosted in that repo.
Hey @astencel-sumo @bryan-aguilar. Is my reply answer your questions? Please let us know if there are other questions. Thanks!
- The settings will be adjusted time by time according to factors. One of the major factor is the data points provided from the collector lambda layer in Solarwinds backends.
Not sure if I understand this correctly. Do you mean that the extension's behavior may change depending on the data flowing through the collector? I can see one issue with this - an extension does not have access to data flowing through pipelines in the collector. Can you share an example of a behavior of the extension changing depending on the data processed by the collector?
- They need to use the same secret to authenticate
The configuration you shared uses a secret passed in via an environment variable (${env:KEY}). Any other program can use this environment variable - it does not need to be the collector.
- It exposes the Solarwinds settings via file.
I don't see how this is related to the collector.
To sum up, I'm still not convinced that placing this functionality in a collector extension is justified. Any program other than the OpenTelemetry Collector can use the authentication secret from the environment variable, call the Solarwinds APM endpoint, and output the result to a file.
1 & 3. The extension's behavior doesn't change but the content of output files (/tmp/solarwinds-apm-settings.json & /tmp/solarwinds-apm-settings.raw) changes depending on the data flowing through collector. For instance, Solarwinds backend can adjust the content of the output files (e.g. sample rate) dynamically based on a particular service or change the rate limiting to the default strategy dynamically.
Indeed this extension is inspired by extension\jaegerremotesampling to get a remote setting from backend. extension\solarwindsapmgetsettingsextension does similar logic as extension\jaegerremotesampling by periodically reload the remote setting from backend. The difference is extension\solarwindsapmgetsettingsextension exposes the remote setting via JSON file while extension\jaegerremotesampling exposes the remote setting via HTTP/JSON or GRPC.
We do see Jaeger provided Jaeger Remote Sampler for anyone who can consume Jaeger API. Currently we also have something similar, a Solarwinds Sampler calls a Solarwinds internal library's function that consumes the file and returns the sampling decision to the sampler. The difference is Jaeger Remote Sampler is part of opentelemetry-java while our Solarwinds Sampler is part of our OTel Python custom distribution for now.
From the collector extension README.md, Generally, extensions are used for implementing components that can be added to the Collector, but which do not require direct access to telemetry data and are not part of the pipelines (like receivers, processors or exporters). So we think extension\solarwindsapmgetsettingsextension is a good way to go. We just aware that an extension does not have access to data flowing through pipelines in the collector could be an issue. Can you help to elaborate the reason and the impact on that?
- Agree that environment variable is global and can be use by any program.
It is true that any program can call Solarwinds APM endpoint and output to a file. However, we want the remote setting to be available when collector is up and ready and the remote setting to be expired when collector is down. We found that using collector extension is a way to do so.
In contrast, can you please help to share why extension\jaegerremotesampling can be justified but not extension\solarwindsapmgetsettingsextension? We are open to recommendation/suggested changes and happy to meet open-telemetry community's expectation.
Finally, thanks for your input and your time to understand our design. Much appreciated.
Indeed this extension is inspired by extension\jaegerremotesampling to get a remote setting from backend.
This is an interesting observation, thanks, I wasn't aware of this. We'd have to ask @jpkrohling or @bogdandrutu what was the rationale back then, as they were apparently involved in the creation of the Jaeger extension: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/6510.
It doesn't seem right to me to have a functionality as part of the collector that is not related to the working of the collector itself, but I wonder what others think. @open-telemetry/collector-contrib-maintainer I'd love your opinions here. This comment above has the summary of the issue: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27668#issuecomment-1876659127:
It doesn't look like the extension exposes any API to any other collector component or really do anything related to the collector that would justify it being implemented as an OpenTelemetry Collector extension.
Perhaps a better approach would be to have tool/binary/script outside of the OpenTelemetry Collector that does the job of calling the API and writing the files to disk.
@jerrytfleung raises the point in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27668#issuecomment-1885784243 that the Jaeger extension already does this kind of thing.
I can provide the background for the decision around Jaeger, but I can't judge whether the same would apply for this component here.
Jaeger Collector has a feature where Jaeger Clients and Agents can request a sampling configuration. When Jaeger decided to phase out their clients in favor of OTel SDKs, one of the requirements they had was to provide feature parity with Jaeger Clients. This meant adding remote samplers to key SDKs and an endpoint to the Collector to serve sampling configuration to said clients and agents in a way that is compatible with Jaeger Collector. We initially implemented this as part of the Jaeger Receiver, but eventually, people wanted to use the remote sampling feature but not the Jaeger receiver, which is why we split it into its own extension.
It doesn't seem right to me to have a functionality as part of the collector that is not related to the working of the collector itself, but I wonder what others think
I assume you are talking specifically about accepting a component for control, and in that case, I think there's another scenario I would consider accepting an extension: if it helps our end-users simplify they deployments by reducing the number of agents they have on their infra. If this extension here means that people can stop using a proprietary agent in favor of OTel Collector, I think it's a good use case.
Yes, I think we are in the situation like Jaegar was. We are totally in favor of OTel(s) but we still need to provide feature parity with Solarwinds APM clients.
We are migrating Solarwinds' AWS instrumentation offering based on our proprietary agent to OTel collector & OTel instrumenations.
As mentioned in above reply, using open-telemetry collector in our infra (outside of a lambda deployment) is on our roadmap. Our goal is to adopt OTel collector instead of building more proprietary code.
Please let us know if it can be judged as a good use case. Also, we are happy to fill any gaps so as to make it as a good use case.
All right, sounds good to me. I'm not going to oppose the introduction of this extension anymore. Thanks for the productive discussion! :clap:
Removing on-hold tag on the PR.
This is the 2nd PR adding part of the concrete implementation of the extension
@atoulme I don't know if it is appropriate to ask you. Can you kindly take a look to the 2nd PR? Thanks!
Thanks, I took a look, please review.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
This issue has been closed as inactive because it has been stale for 120 days with no activity.
Could this be reopened since we are actively working on the final implementation https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33315