opentelemetry-collector
opentelemetry-collector copied to clipboard
Enables gRPC Health Checking Protocol on Open Telemetry Collector
Description: Adding a feature - Enabling gRPC Health Checking Protocol on Open Telemetry Collector.
Link to tracking Issue: Not Applicable
Testing: Before this PR, if you use any gRPC tool [1,2,3] to probe otelcol gRPC port (4317) you would get as response:
$> grpc-health-probe -addr=localhost:4317
error: this server does not implement the grpc health protocol (grpc.health.v1.Health): unknown service grpc.health.v1.Health
Now, the gRPC server responds properly to gRPC Health Protocol:
$> grpc-health-probe -addr=localhost:4317
status: SERVING
- [1] grpc-health-probe
- [2] grpcurl
- [3] ghz
Documentation: I will be glad to add documentation somewhere in the project. By now, I am considering these changes simply complies with the gRPC industry standard gRPC Health Checking Protocol. Thus, I think it worths an entry on the Changelog but not necessarily a documentation entry.
Why these changes matter? It is quite common for gRPC users to probe their gRPC endpoints using the tools described above. It helps checking if your remote server is up and running before even knowing details of its remote routes/methods.
Therefore, this PR improves developer experience while using or getting started with Open Telemetry Collector. Here follow some of arguments for this PR:
- It makes Collector more observable which complies with this section of the long-term vision
- It makes Collector more usable out of the box as described in this section of the long-term vision
- It adds a quite simple feature which complies with the contributing guideline
- It does not add ANY new dependency. Basically we simply enabled a feature that was sitting in the project but not getting used. The code uses
google.golang.org/grpclibrary which was already imported in the Collector code.
Please let me know if there is anything missing here or anything I can improve in this contribution.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Any feedback here, folks? Is there anything I can do to trigger the workflows so I can see any pending contributions I need to do?
I triggered them for you.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.56%. Comparing base (
1d52fb9) to head (ab3cc9e). Report is 45 commits behind head on main.
:exclamation: Current head ab3cc9e differs from pull request most recent head ac73929
Please upload reports for the commit ac73929 to get more accurate results.
Additional details and impacted files
@@ Coverage Diff @@
## main #8955 +/- ##
==========================================
- Coverage 91.63% 91.56% -0.08%
==========================================
Files 356 360 +4
Lines 16849 16695 -154
==========================================
- Hits 15439 15286 -153
- Misses 1068 1073 +5
+ Partials 342 336 -6
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi folks, I have added the changelog entry (file). Can we trigger CI once again to check if everything is ok?
Thank you for the suggestions, @TylerHelmuth. I have also updated this PR with latest changes that was on main branch. Do you mind triggering the CI so we can check if there is anything missing? Please, let me know if you see any other improvements.
@pantuza I don't have the right permissions in this repo. @open-telemetry/collector-approvers please take a look.
Is this a dup of https://github.com/open-telemetry/opentelemetry-collector/pull/8397 ?
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Hello @pantuza, Is there anything that might be changed from your side to pass checks?
FYI, @open-telemetry/collector-approvers, @dmitryax
Hello @pantuza, Is there anything that might be changed from your side to pass checks?
FYI, @open-telemetry/collector-approvers, @dmitryax
Hi, @ksandrmatveyev. Thank you for the reply. Not actually. I don't have permissions to trigger the CI execution. Last time it was triggered, everything passed.
I see that by now this PR is outdated from Master/Main branch. I will make sure to update it.
I would be really glad If someone can trigger the CI so we can check if everything is ok.
@pantuza I'm adding this PR to the collector's SIG agenda on Monday.
is there any reason a user wouldn't want to have this enabled?
In general, it's good practice to have admin-related endpoints at a different port than the workload, but for this specific one, I think it's ok and desirable to have it the way it's implemented in this PR.
Seems it fails on test phase https://github.com/open-telemetry/opentelemetry-collector/actions/runs/8838230921/job/24299483330?pr=8955 FYI @pantuza
Seems it fails on test phase https://github.com/open-telemetry/opentelemetry-collector/actions/runs/8838230921/job/24299483330?pr=8955 FYI @pantuza
Hi, @ksandrmatveyev . Thank you for the reply.
Looking at the error, it is related to contrib project. That jaegerremotesampling extension is registering the Health server. Thus, gRPC fails saying there are two codes trying to register the same service (reference of the error).
Since we are enabling it directly on the Collector, no extension needs to enable it. It will become a builtin feature of the Collector. What do you think?
As a follow up for this PR, I can send a Pull Request to contrib project disabling this on jaegerremotesampling code.
Any thoughts on my suggestions?
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.