opentelemetry-ruby icon indicating copy to clipboard operation
opentelemetry-ruby copied to clipboard

OTLP::Exporter#encode - bignum too big to convert into `unsigned long long'

Open manasaheggere opened this issue 1 year ago • 18 comments

Hi Team,

We are using Ruby 3.2.5 and Rails 7.1.3.3

We have installed below opentelemetry gems Gemfile

gem "opentelemetry-sdk"
gem "opentelemetry-exporter-otlp"
gem "opentelemetry-instrumentation-all"

and done the sdk configuration in opentelemetry.rb

require 'opentelemetry/sdk'
require 'opentelemetry/instrumentation/all'
require 'opentelemetry-exporter-otlp'

OpenTelemetry::SDK.configure do |c|
  c.service_name = <service_name>
  c.use_all()
  c.add_span_processor(
    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
      OpenTelemetry::Exporter::OTLP::Exporter.new
    )
  )
end
MyAppTracer = OpenTelemetry.tracer_provider.tracer(<tracer>)

We have also configured below environment variables

- name: OTEL_EXPORTER
  value: 'otlp'
- name: JAEGER_DISABLED
  value: 'true'
- name: JAEGER_SERVICE_NAME
  value: <service_name>
- name: JAEGER_AGENT_HOST
  valueFrom:
    fieldRef:
      apiVersion: v1
      fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
  value: http://$(JAEGER_AGENT_HOST):4318
- name: OTEL_SERVICE_NAME
  value: <service_name>

But we are seeing below errors in pre-prod environment

ERROR -- : OpenTelemetry error: Unable to export 485 spans
ERROR -- : OpenTelemetry error: unexpected error in OTLP::Exporter#encode - bignum too big to convert into `unsigned long long' - /home/circleci/.rubygems/gems/opentelemetry-exporter-otlp-0.29.0/lib/opentelemetry/exporter/otlp/exporter.rb:327:in `initialize'

manasaheggere avatar Dec 04 '24 06:12 manasaheggere

@manasaheggere, thanks for reaching out! I'm sorry to hear about the export errors.

Were things working on an earlier version of the OTLP exporter?

The line referenced in the error points to the encoding of a timestamp for a span event. https://github.com/open-telemetry/opentelemetry-ruby/blob/2f3ec5995d840d6c06267f50bd9c9ff534f0587e/exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb#L327

Are you creating span events with your app's tracer using the Span#add_event API? https://github.com/open-telemetry/opentelemetry-ruby/blob/cdce3cc1195a35cbaa48d00bea60e5fa64e19ab6/sdk/lib/opentelemetry/sdk/trace/span.rb#L152-L170

kaylareopelle avatar Dec 04 '24 23:12 kaylareopelle

Thanks for the response @kaylareopelle

We started implementing OTLP exporter now only, we were not used it earlier.

No, we have not implemented span.add_event

We have tried to create span using below code. This approach is also giving same error

require "opentelemetry/sdk"

def track_extended_warranty(extended_warranty)
  # Get the current span
  current_span = OpenTelemetry::Trace.current_span

  # And add useful stuff to it!
  current_span.add_attributes({
    "com.extended_warranty.id" => extended_warranty.id,
    "com.extended_warranty.timestamp" => extended_warranty.timestamp
  })
end
require "opentelemetry/sdk"

def do_work
  MyAppTracer.in_span("do_work") do |span|
    # do some work that the 'do_work' span tracks!
  end
end

manasaheggere avatar Dec 05 '24 09:12 manasaheggere

Hi @manasaheggere, thanks for your responses! I'm not able to reproduce the error yet with the provided code.

We'll need to reproduce the error outside of your environment to debug further.

Here's a walkthrough on how to create a minimal, reproducible example: https://stackoverflow.com/help/minimal-reproducible-example

This is the code I've used to test so far: https://gist.github.com/kaylareopelle/ef261c12e3a3e1c2cce59a25050c23f0 I'm running this script, while simultaneously running the OTLP collector available here. To run this collector, clone the opentelemetry-ruby repo, enter the examples/otel-collector directory, and run docker compose up to start the OTel Collector.

Can you update the gist and/or create a different reproduction script that raises the same error?

kaylareopelle avatar Dec 05 '24 23:12 kaylareopelle

Hi @kaylareopelle

We cloned the opentelemetry-ruby opensource code and validated we are seeing only warnings and not traces. We have also created a sample application to reproduce the issue. With the same piece of code we are able to see the traces at our local on Jeager UI.

Even in actual project, at local we are not seeing this issue. When we deploy the same code to pre-prod we are seeing the below errors.

ERROR -- : OpenTelemetry error: unexpected error in OTLP::Exporter#encode - bignum too big to convert into `unsigned long long' - /home/circleci/.rubygems/gems/opentelemetry-exporter-otlp-0.29.1/lib/opentelemetry/exporter/otlp/exporter.rb:326:in `initialize'
ERROR -- : OpenTelemetry error: Unable to export 61 spans

manasaheggere avatar Dec 06 '24 12:12 manasaheggere

Is there anything else you can share with us?

Machine details like CPU architecture?

Are you deploying using containers? If so what is the image you're using?

Can you share the contents of the lock file that is generated and what version of protobuf is being installed?

arielvalentin avatar Dec 06 '24 12:12 arielvalentin

Hi @arielvalentin

We are deploying our application docker image in AWS cloud using ArgoCD and kubernites. We have used protobuf 4.29.0 version. Please find the attached Gemfile.lock. lock_file.zip

manasaheggere avatar Dec 09 '24 12:12 manasaheggere

Hi @manasaheggere, thanks for sharing details around your architecture and your lock file.

We discussed this issue during the SIG and think we may need some additional logging around the exporter to try to get a better sense of what the data that's being rejected looks like. I have some code with extra logging I'd like you to add to your app.

I have two options for how you can add it:

  1. I created a branch I'd like you to use to install the OTLP exporter. To install it, update the line in your Gemfile for opentelemetry-exporter-otlp to:
gem 'opentelemetry-exporter-otlp', github: 'kaylareopelle/opentelemetry-ruby', branch: 'debug-unsigned-long-long', glob: 'exporter/otlp/*.gemspec'
  1. Alternatively, you can monkey patch your exporter by adding the code in this gist to your opentelemetry.rb file before you call OpenTelemetry::SDK.configure.

Could you run the exporter with the additional logging code in the environment where the error is raised and share the logs with us? Please remove any information that might be considered sensitive from the file before you post.

kaylareopelle avatar Dec 13 '24 01:12 kaylareopelle

Hi @arielvalentin

We are deploying our application docker image in AWS cloud using ArgoCD and kubernites.

We have used protobuf 4.29.0 version. Please find the attached Gemfile.lock.

lock_file.zip

What is the machine architecture? 32 or 64 bit?

arielvalentin avatar Dec 13 '24 03:12 arielvalentin

Hi @kaylareopelle

I have followed approach 1 and attached the screenshot of the loggers. I am seeing more logs similar to the loggers in screenshot, please let me know if any specific loggers is required to be shared. Screenshot 2024-12-18 at 5 13 54 PM

We are also getting one more error ERROR -- : OpenTelemetry error: unexpected configuration error due to attribute values must be (array of) strings, integers, floats, or booleans - OpenTelemetry::SDK::ConfigurationError - /home/circleci/.rubygems/gems/opentelemetry-sdk-1.6.0/lib/opentelemetry/sdk.rb:69:in rescue in configure'`

Hi @arielvalentin

Machine architecture is 64bit.

manasaheggere avatar Dec 18 '24 11:12 manasaheggere

Can you share your gem file.lock ?

arielvalentin avatar Dec 18 '24 13:12 arielvalentin

Hi @arielvalentin

Please find the attached Gemfile.lock gemfile_lock.zip

manasaheggere avatar Dec 19 '24 12:12 manasaheggere

Hi Team, Any luck on this fix ?

manasaheggere avatar Dec 26 '24 04:12 manasaheggere

We've run into the end of year holidays here so I don't think anyone has taken a closer look since that time.

Something you may want to try to give you better diagnostic information is adding a custom error handler to give us a bit more detail into what may be happening there.

arielvalentin avatar Jan 01 '25 19:01 arielvalentin

👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this issue will be closed eventually by the stale bot.

github-actions[bot] avatar Feb 01 '25 02:02 github-actions[bot]

No luck so far reproducing this problem. So next I will ask that you enable one instrumentation at a time until we pinpoint the one is generating a an Evrent with a BigNum (BigDecimal) that is incompatible with the google protobuf 4.29.

arielvalentin avatar Feb 01 '25 04:02 arielvalentin

@dazuma do you know anyone on the protobuf team that could help investigate this problem?

arielvalentin avatar Feb 21 '25 13:02 arielvalentin

I don't think this original issue is the same problem. But I just encountered this error with the sidekiq instrumentation, which (in v8) started passing millisecond integer times instead of second floats.

See https://github.com/open-telemetry/opentelemetry-ruby-contrib/pull/1444

dmathieu avatar Mar 12 '25 09:03 dmathieu

I had this same problem because I was passing nanos as an integer which were assumed seconds and then multiplied to overflow this value. The docs at https://www.rubydoc.info/gems/opentelemetry-api/OpenTelemetry/Trace/Tracer#in_span-instance_method clearly state for start_timestamp:

start_timestamp (optional Integer) (defaults to: nil) — nanoseconds since Epoch

But that's inaccurate, it actually expects Time or Float of seconds or similar. I opened #1841.

cretz avatar Apr 22 '25 22:04 cretz