dd-trace-rb icon indicating copy to clipboard operation
dd-trace-rb copied to clipboard

Profiling native extension does not detect libdatadog upgrades/downgrades

Open ivoanjo opened this issue 2 years ago • 8 comments

This is a bit of a corner case that occurred to me the other day, and I want to track it so that we don't forget about it.

TL;DR: The workaround for this is to reinstall ddtrace after changing libdatadog versions.

Current behaviour:

Because ddtrace compiles and links against libdatadog at installation time, it becomes "bound" to the libdatadog that was available at that time, and does not respect any changes that are made after that.

Consider this gems.rb file:

source 'https://rubygems.org'

gem 'google-protobuf'
gem 'ddtrace'
gem 'libdatadog', '= 0.7.0.1.0'

and a Ruby installation that has no libdatadog or ddtrace version installed:

root@c7311b47f69d:/app/libdatadog-detect-missing# gem uninstall libdatadog ddtrace
Gem 'libdatadog' is not installed
Gem 'ddtrace' is not installed

Now let's run bundle install:

root@c7311b47f69d:/app/libdatadog-detect-missing# bundle install
Fetching gem metadata from https://rubygems.org/...
Resolving dependencies...
Using bundler 2.3.6
Using debase-ruby_core_source 0.10.16
Using ffi 1.15.5
Using msgpack 1.5.6
Using google-protobuf 3.21.5 (x86_64-linux)
Using libddwaf 1.3.0.2.0 (x86_64-linux)
Fetching libdatadog 0.7.0.1.0 (x86_64-linux)
Installing libdatadog 0.7.0.1.0 (x86_64-linux)
Fetching ddtrace 1.3.0
Installing ddtrace 1.3.0 with native extensions
Bundle complete! 3 Gemfile dependencies, 8 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.

At this point, ddtrace gets installed and links to libdatadog 0.7.0.1.0.

root@c7311b47f69d:/app/libdatadog-detect-missing# ldd /usr/local/bundle/gems/ddtrace-1.3.0/ext/ddtrace_profiling_native_extension/ddtrace_profiling_native_extension.2.7.3_x86_64-linux.so
	linux-vdso.so.1 (0x00007fff51b8e000)
	libruby.so.2.7 => /usr/local/lib/libruby.so.2.7 (0x00007f1942476000)
	libddprof_ffi.so => /usr/local/bundle/gems/libdatadog-0.7.0.1.0-x86_64-linux/vendor/libdatadog-0.7.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/pkgconfig/../../lib/libddprof_ffi.so (0x00007f19421c5000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1942038000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1941e77000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f1941c59000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1941c36000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1941c2c000)
	libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f1941ba9000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1941ba4000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f1941b6a000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1942833000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1941b50000)

But, let's say as a regular Ruby user, you pick a different version of libdatadog and run bundle install:

source 'https://rubygems.org'

gem 'google-protobuf'
gem 'ddtrace'
gem 'libdatadog', '= 0.7.0.1.1' # this was changed from 0.7.0.1.0 to 0.7.0.1.1
root@c7311b47f69d:/app/libdatadog-detect-missing# bundle install
Fetching gem metadata from https://rubygems.org/..
Resolving dependencies...
Using bundler 2.3.6
Using msgpack 1.5.6
Using ffi 1.15.5
Using google-protobuf 3.21.5 (x86_64-linux)
Using debase-ruby_core_source 0.10.16
Using libddwaf 1.3.0.2.0 (x86_64-linux)
Fetching libdatadog 0.7.0.1.1 (x86_64-linux) (was 0.7.0.1.0)
Installing libdatadog 0.7.0.1.1 (x86_64-linux) (was 0.7.0.1.0)
Using ddtrace 1.3.0
Bundle complete! 3 Gemfile dependencies, 8 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.

Now asking for the version of libdatadog on the system will state that you're supposedly using 0.7.0.1.1:

root@c7311b47f69d:/app/libdatadog-detect-missing# bundle exec ruby -e "require 'libdatadog'; puts Libdatadog::VERSION"
0.7.0.1.1

but actually ddtrace is not using that version:

root@c7311b47f69d:/app/libdatadog-detect-missing# ldd /usr/local/bundle/gems/ddtrace-1.3.0/ext/ddtrace_profiling_native_extension/ddtrace_profiling_native_extension.2.7.3_x86_64-linux.so
	linux-vdso.so.1 (0x00007ffc33deb000)
	libruby.so.2.7 => /usr/local/lib/libruby.so.2.7 (0x00007f034211c000)
	libddprof_ffi.so => /usr/local/bundle/gems/libdatadog-0.7.0.1.0-x86_64-linux/vendor/libdatadog-0.7.0/x86_64-linux/libdatadog-x86_64-unknown-linux-gnu/lib/pkgconfig/../../lib/libddprof_ffi.so (0x00007f0341e6b000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0341cde000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0341b1d000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f03418ff000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f03418dc000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f03418d2000)
	libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f034184f000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f034184a000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f0341810000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f03424d9000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f03417f6000)

So you think you've upgraded/downgraded where in fact you haven't.

Further proof happens if you actually remove the old version:

root@c7311b47f69d:/app/libdatadog-detect-missing# gem uninstall libdatadog

Select gem to uninstall:
 1. libdatadog-0.7.0.1.0-x86_64-linux
 2. libdatadog-0.7.0.1.1-x86_64-linux
 3. All versions
> 1
Successfully uninstalled libdatadog-0.7.0.1.0-x86_64-linux

root@c7311b47f69d:/app/libdatadog-detect-missing# DD_PROFILING_ENABLED=true bundle exec ddtracerb exec ruby -e "require 'libdatadog'; puts Libdatadog::VERSION"
W, [2022-08-25T09:27:43.103258 #337]  WARN -- ddtrace: [ddtrace] Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'RuntimeError Failure to load ddtrace_profiling_native_extension.2.7.3_x86_64-linux due to libddprof_ffi.so: cannot open shared object file: No such file or directory' at '/usr/local/bundle/gems/ddtrace-1.3.0/lib/datadog/profiling/load_native_extension.rb:22:in `<top (required)>''
0.7.0.1.1

Bundler is still able to resolve a valid version BUT the profiler is broken since it's not actually using that version.

Since this only happens with libdatadog point releases, and we don't do those often, I doubt anyone's been bitten by this, but it's definitely a sharp edge that we should address.

Expected behaviour:

Ideally, the profiling native extension should automatically pick up and work with updated versions of libdatadog.

At minimum, it should a) detect that there's a mismatched libdatadog version; and b) provide a good error message stating what happened and how to fix it.

Steps to reproduce

(See above)

ivoanjo avatar Aug 25 '22 09:08 ivoanjo

Addendum: I suspect this issue can also be triggered when changing the platform on libdatadog, see https://github.com/DataDog/dd-trace-rb/issues/2652#issuecomment-1450539119 for details.

ivoanjo avatar Mar 01 '23 17:03 ivoanjo

The workaround for this is to reinstall ddtrace

IIRC gem pristine ddtrace should do it.

lloeki avatar Apr 12 '23 09:04 lloeki

I hit this issue when I upgraded datadog agent on vms. I'm confused since I thought libdatadog would bundle the library, not relying on the system lib.

I believe this is a bug of libdatadog not of ddtrace.

chulkilee avatar Jul 04 '23 10:07 chulkilee

Sorry to hear you're affected @chulkilee . Do you by any chance still have access to the logs and can share the error message you got?

I've been thinking that in some cases we may be able to provide a much better error message, but I wanted to doublecheck it would cover your case.

ivoanjo avatar Jul 04 '23 11:07 ivoanjo

W, [2023-07-04T08:35:31.895350 #29107]  WARN -- ddtrace: [ddtrace] (/app/shared/bundle/ruby/2.7.0/gems/ddtrace-1.12.1/lib/datadog/core/configuration/components.rb:103:in `startup!') Profiling was requested but is not supported, profiling disabled: There was an error loading the profiling native extension due to 'LoadError cannot load such file -- ddtrace_profiling_loader.2.7.8_x86_64-linux' at '/app/shared/bundle/ruby/2.7.0/gems/bootsnap-1.7.5/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `require''

Wait, maybe it's between ruby upgrade (2.7.7 and 2.7.8) - I just found that the error message included the path "2.7.0" - which is not specific to ruby version (2.7.7 or 2.7.8). Maybe bundle install after ruby upgrade didn't trigger "reinstall" libdatadog here

chulkilee avatar Jul 04 '23 11:07 chulkilee

Interesting... I think this one may not involve libdatadog at all (although the error message is similar and the fix is similar as well).

Looking at the log message, I suspect what happened was that when upgrading from 2.7.7 to 2.7.8, ddtrace itself was not reinstalled (and thus not recompiled).

At installation time, ddtrace includes the full Ruby version in the compiled parts of the profiler -- e.g. "ddtrace_profiling_loader.2.7.7_x86_64-linux". Thus, if the same installation gets reused on a different version, then it won't work, because it tries to load a different version.

This is on purpose (to avoid mismatches between the profiler and the Ruby version) but yeah I can definitely see how the error message is super opaque and none of the details I share above are obvious.

And the fix is indeed the same -- reinstall ddtrace.

I'll make a note to detect these errors and provide a better message.

ivoanjo avatar Jul 04 '23 12:07 ivoanjo

Thanks for sharing the log message btw, it helps a lot!

ivoanjo avatar Jul 04 '23 12:07 ivoanjo

PR to improve the log message: https://github.com/DataDog/dd-trace-rb/pull/2957

ivoanjo avatar Jul 10 '23 15:07 ivoanjo