opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Enables gRPC Health Checking Protocol on Open Telemetry Collector

Open pantuza opened this issue 2 years ago • 19 comments
trafficstars

Description: Adding a feature - Enabling gRPC Health Checking Protocol on Open Telemetry Collector.

Link to tracking Issue: Not Applicable

Testing: Before this PR, if you use any gRPC tool [1,2,3] to probe otelcol gRPC port (4317) you would get as response:

$> grpc-health-probe -addr=localhost:4317
error: this server does not implement the grpc health protocol (grpc.health.v1.Health): unknown service grpc.health.v1.Health

Now, the gRPC server responds properly to gRPC Health Protocol:

$> grpc-health-probe -addr=localhost:4317
status: SERVING

Documentation: I will be glad to add documentation somewhere in the project. By now, I am considering these changes simply complies with the gRPC industry standard gRPC Health Checking Protocol. Thus, I think it worths an entry on the Changelog but not necessarily a documentation entry.

Why these changes matter? It is quite common for gRPC users to probe their gRPC endpoints using the tools described above. It helps checking if your remote server is up and running before even knowing details of its remote routes/methods.

Therefore, this PR improves developer experience while using or getting started with Open Telemetry Collector. Here follow some of arguments for this PR:

  1. It makes Collector more observable which complies with this section of the long-term vision
  2. It makes Collector more usable out of the box as described in this section of the long-term vision
  3. It adds a quite simple feature which complies with the contributing guideline
  4. It does not add ANY new dependency. Basically we simply enabled a feature that was sitting in the project but not getting used. The code uses google.golang.org/grpc library which was already imported in the Collector code.

Please let me know if there is anything missing here or anything I can improve in this contribution.

pantuza avatar Nov 18 '23 03:11 pantuza

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Dec 03 '23 03:12 github-actions[bot]

Any feedback here, folks?  Is there anything I can do to trigger the workflows so I can see any pending contributions I need to do?

pantuza avatar Dec 06 '23 21:12 pantuza

I triggered them for you.

atoulme avatar Dec 06 '23 22:12 atoulme

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.56%. Comparing base (1d52fb9) to head (ab3cc9e). Report is 45 commits behind head on main.

:exclamation: Current head ab3cc9e differs from pull request most recent head ac73929

Please upload reports for the commit ac73929 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8955      +/-   ##
==========================================
- Coverage   91.63%   91.56%   -0.08%     
==========================================
  Files         356      360       +4     
  Lines       16849    16695     -154     
==========================================
- Hits        15439    15286     -153     
- Misses       1068     1073       +5     
+ Partials      342      336       -6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Dec 06 '23 22:12 codecov[bot]

Hi folks, I have added the changelog entry (file). Can we trigger CI once again to check if everything is ok?

pantuza avatar Dec 08 '23 20:12 pantuza

Thank you for the suggestions, @TylerHelmuth. I have also updated this PR with latest changes that was on main branch. Do you mind triggering the CI so we can check if there is anything missing? Please, let me know if you see any other improvements.

pantuza avatar Dec 12 '23 13:12 pantuza

@pantuza I don't have the right permissions in this repo. @open-telemetry/collector-approvers please take a look.

TylerHelmuth avatar Dec 12 '23 18:12 TylerHelmuth

Is this a dup of https://github.com/open-telemetry/opentelemetry-collector/pull/8397 ?

atoulme avatar Dec 19 '23 23:12 atoulme

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jan 03 '24 03:01 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jan 18 '24 03:01 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Feb 03 '24 03:02 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Feb 22 '24 03:02 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Mar 21 '24 03:03 github-actions[bot]

Hello @pantuza, Is there anything that might be changed from your side to pass checks?

FYI, @open-telemetry/collector-approvers, @dmitryax

ksandrmatveyev avatar Apr 04 '24 07:04 ksandrmatveyev

Hello @pantuza, Is there anything that might be changed from your side to pass checks?

FYI, @open-telemetry/collector-approvers, @dmitryax

Hi, @ksandrmatveyev. Thank you for the reply. Not actually. I don't have permissions to trigger the CI execution. Last time it was triggered, everything passed.

I see that by now this PR is outdated from Master/Main branch. I will make sure to update it.

I would be really glad If someone can trigger the CI so we can check if everything is ok.

pantuza avatar Apr 04 '24 13:04 pantuza

@pantuza I'm adding this PR to the collector's SIG agenda on Monday.

TylerHelmuth avatar Apr 22 '24 16:04 TylerHelmuth

is there any reason a user wouldn't want to have this enabled?

In general, it's good practice to have admin-related endpoints at a different port than the workload, but for this specific one, I think it's ok and desirable to have it the way it's implemented in this PR.

jpkrohling avatar Apr 24 '24 16:04 jpkrohling

Seems it fails on test phase https://github.com/open-telemetry/opentelemetry-collector/actions/runs/8838230921/job/24299483330?pr=8955 FYI @pantuza

ksandrmatveyev avatar May 10 '24 15:05 ksandrmatveyev

Seems it fails on test phase https://github.com/open-telemetry/opentelemetry-collector/actions/runs/8838230921/job/24299483330?pr=8955 FYI @pantuza

Hi, @ksandrmatveyev . Thank you for the reply.

Looking at the error, it is related to contrib project. That jaegerremotesampling extension is registering the Health server. Thus, gRPC fails saying there are two codes trying to register the same service (reference of the error).

Since we are enabling it directly on the Collector, no extension needs to enable it. It will become a builtin feature of the Collector. What do you think?

As a follow up for this PR, I can send a Pull Request to contrib project disabling this on jaegerremotesampling code. Any thoughts on my suggestions?

pantuza avatar May 10 '24 19:05 pantuza

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar May 25 '24 03:05 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Jun 09 '24 03:06 github-actions[bot]