opentelemetry-ruby icon indicating copy to clipboard operation
opentelemetry-ruby copied to clipboard

Log more details for "Unable to export # spans"

Open francisdb opened this issue 2 years ago • 5 comments

Description of the bug

I'm trying to send traces to our otlp compliant tempo service Our jvm services can send traces without issues with only configuring the trace endpoint

Now I'm trying to do the same for our ruby code but have the following issue:

OTEL_EXPORTER_OTLP_ENDPOINT="https://internal.url" OTEL_EXPORTER_OTLP_PROTOCOL=grpc OTEL_LOG_LEVEL=DEBUG

D, [2022-03-25T20:23:01.357950 #71922] DEBUG -- : Upgrading default proxy tracer provider to OpenTelemetry::SDK::Trace::TracerProvider
I, [2022-03-25T20:23:01.361189 #71922]  INFO -- : Instrumentation: OpenTelemetry::Instrumentation::Faraday was successfully installed

perform some http calls here

E, [2022-03-25T20:23:32.033916 #71922] ERROR -- : OpenTelemetry error: Unable to export 30 spans
E, [2022-03-25T20:23:42.271592 #71922] ERROR -- : OpenTelemetry error: Unable to export 72 spans

The main issue is that we don't have more output than this so it is very hard to know what fails.

The http calls correctly get the traceparent header with the surrounding traceId as we can look up the spans from the services this code talks to.

Share details about your runtime

Operating system details: macOS 12.3 / Ubuntu Linux RUBY_ENGINE: "ruby" RUBY_VERSION: "3.1.1" RUBY_DESCRIPTION: "ruby 3.1.1p18 (2022-02-18 revision 53f5fc4236) [arm64-darwin21]"

francisdb avatar Mar 25 '22 20:03 francisdb

This problem occurs because we don't support gRPC in the OTLP exporter. We only support HTTP. Unfortunately, we don't support the OTEL_EXPORTER_OTLP_PROTOCOL env var either, so we're configuring a OTLP/HTTP exporter regardless of that variable.

We don't anticipate having a OTLP/gRPC exporter 🔜 (tho PRs are welcome), but we should interpret the OTEL_EXPORTER_OTLP_PROTOCOL env var and log a warning if grpc is requested, and not configure the OTLP/HTTP exporter.

fbogsany avatar Mar 28 '22 21:03 fbogsany

That would already help if the code would log some warning that things are not supported and falling back to other protocols.

But even then, I have been testing with OTLP/HTTP and if something is wrong in the url you get the same unhelpful messages without details. Would be better if for example a 404 error is logged somewhere, could be on debug level only to avoid spamming.

francisdb avatar Mar 29 '22 12:03 francisdb

Would be better if for example a 404 error is logged somewhere, could be on debug level only to avoid spamming.

I think that's reasonable. We have a bunch of error handling in the exporter and support an optional metrics reporter. In general, we try to avoid spamming logs, but I think it's reasonable to do that in this case. At the moment, we're falling into https://github.com/open-telemetry/opentelemetry-ruby/blob/8eac4a112f996e088e693add37227c11a67baa2d/exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb#L196-L199 which just resets the connection and returns FAILURE without logging or otherwise reporting the error. Two options here: explicitly handle the 404 or simply log any unrecognized error in this block.

fbogsany avatar Mar 29 '22 14:03 fbogsany

I've been using js/jvm libraries and those always log at least the error type/message, also for non-404. Eg http protocol error when connecting to the wrong endpoint.

francisdb avatar Mar 29 '22 15:03 francisdb

We have a bunch of error handling in the exporter and support an optional metrics reporter

@fbogsany would you suggest setting a metrics reporter to understand why spans are failing to export? Also starting to notice a slight uptick in these failures with our production ruby instrumentations (Have an internal OTEL Collector Agent + Standalone Collector setup - the collectors seem healthy).

We need to get better visibility surrounding why the exports are failing in production. So we're setting up agent collector metrics to see if spans are being refused/dropped, However, we'd also like our instrumentations to be more informative on why the OTEL Error is occurring, to begin with. Seems like the metrics reporter is a good fit here after stumbling across:

https://github.com/open-telemetry/opentelemetry-ruby/blob/8eac4a112f996e088e693add37227c11a67baa2d/exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb#L248

nvolker avatar Sep 07 '22 07:09 nvolker

We have a bunch of error handling in the exporter and support an optional metrics reporter

@fbogsany would you suggest setting a metrics reporter to understand why spans are failing to export? Also starting to notice a slight uptick in these failures with our production ruby instrumentations (Have an internal OTEL Collector Agent + Standalone Collector setup - the collectors seem healthy).

We need to get better visibility surrounding why the exports are failing in production. So we're setting up agent collector metrics to see if spans are being refused/dropped, However, we'd also like our instrumentations to be more informative on why the OTEL Error is occurring, to begin with. Seems like the metrics reporter is a good fit here after stumbling across:

https://github.com/open-telemetry/opentelemetry-ruby/blob/8eac4a112f996e088e693add37227c11a67baa2d/exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb#L248

Hi @nvolker , I have similar requirement to get more info on the open telemetry errors. Can you please give more info on how to collect the reason for the failures from the metric reporter ?

vjcracker avatar Nov 30 '22 14:11 vjcracker

Just bumping this thread for another request to add more logging information. In our case, we were expecting that setting the Rails log level to debug would report more information, but after digging into the code, we saw that it wasn't taking advantage of that setting at all. Ultimately, it turned out to be a misconfiguration of our ENV var—so even having the error message report on expected (and missing) config values would've been a huge help!

gjtorikian avatar Apr 25 '23 19:04 gjtorikian

I think this has been closed by https://github.com/open-telemetry/opentelemetry-ruby/pull/1565; please re-open if that did not address concerns!

plantfansam avatar Jan 25 '24 23:01 plantfansam