opentelemetry-rust
opentelemetry-rust copied to clipboard
More information in OpenTelemetry export errors
We are currently using open telemetry with the OTLP exporter. We get these sort of errors while batch-exporting:
OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (Some resource has been exhausted): , detailed error message: grpc: received message larger than max (9074759 vs. 4194304)
Clearly we are sending too large spans. However, it's not easy to track down where the problematic spans are. It would be good if the error message included some information that could help diagnose the issues.
Logging all the span names is probably not a good idea, since there could be a lot of them. Some kind of sampling might help. Alternatively, for this specific error the approximate byte size of the spans could be tallied and logged if they are a significant fraction of the payload. This might be getting into too-specific solutions.
@TommyCpp
Is the tags for this supposed to be under metrics
?
Good catch! Fixed the label
I am getting the same error in one project.
OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (Some resource has been exhausted): , detailed error message: grpc: received message larger than max (169759662 vs. 4194304)
I get the same behavior whether I .install_simple()
or .install_batch(opentelemetry::runtime::TokioCurrentThread)
There are no traces emitted for about 7.5 seconds, then traces are emitted with but the application emits these errors.
I have another app configured identically that does not have this issue. Both use Actix web framework. The one that shows the problem is using the Rust AWS sdk to put events on a Kinesis stream. It also uses the MaxMind db to turn ip into location. Once I aggressively skipped certain arguments in the instrumented functions things started to work. In particular I had instrumented a function that took the MaxMind db as an argument. Once I skipped that then things worked.
It would be nice to able to debug this stuff; emit a message when trace is over a given size, so we can narrow down the culprits.
Just experience that same issue. It is not even clear from the log message whether this happens due to batching or individual tracing statements.
At minimum improved logging here would help