opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Python autoinstrumentation for musl libc based application containers

Open ilyamochalov opened this issue 2 years ago • 8 comments

Component(s)

instrumentation

Is your feature request related to a problem? Please describe.

Python autoinstrumentation for musl libc based application containers fails with the following error:

#16 2.190 ImportError: Error relocating /autoinstrumentation/psutil/_psutil_linux.abi3.so: __sched_cpufree: symbol not found
#16 2.191 Failed to auto initialize opentelemetry
#16 2.191 Traceback (most recent call last):
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
#16 2.191     _load_instrumentors(distro)
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
#16 2.191     raise exc
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
#16 2.191     distro.load_instrumentor(entry_point, skip_dep_check=True)
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
#16 2.191     instrumentor: BaseInstrumentor = entry_point.load()
#16 2.191   File "/autoinstrumentation/pkg_resources/__init__.py", line 2518, in load
#16 2.191     return self.resolve()
#16 2.191   File "/autoinstrumentation/pkg_resources/__init__.py", line 2524, in resolve
#16 2.191     module = __import__(self.module_name, fromlist=['__name__'], level=0)
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/system_metrics/__init__.py", line 79, in <module>
#16 2.191     import psutil
#16 2.191   File "/autoinstrumentation/psutil/__init__.py", line 102, in <module>
#16 2.191     from . import _pslinux as _psplatform
#16 2.191   File "/autoinstrumentation/psutil/_pslinux.py", line 25, in <module>
#16 2.191     from . import _psutil_linux as cext
#16 2.191 ImportError: Error relocating /autoinstrumentation/psutil/_psutil_linux.abi3.so: __sched_cpufree: symbol not found

Root cause: current autoinstrumentation build packaged for BSD libc.

Describe the solution you'd like

  1. Add an extra build stage to alpine base image at https://github.com/open-telemetry/opentelemetry-operator/blob/v0.87.0/autoinstrumentation/python/Dockerfile#L12
  2. Copy instrumentation library into final image into a separate path: https://github.com/open-telemetry/opentelemetry-operator/blob/v0.87.0/autoinstrumentation/python/Dockerfile#L22
  3. Add extra annotation instrumentation.opentelemetry.io/otel-python-auto-runtime: "linux-musl-x64""
  4. Update https://github.com/open-telemetry/opentelemetry-operator/blob/main/pkg/instrumentation/python.go to facilitate changes need to load copy and load correct dependencies

Describe alternatives you've considered

No response

Additional context

Similar change was made for .Net

  • https://github.com/open-telemetry/opentelemetry-operator/pull/2087
  • https://github.com/open-telemetry/opentelemetry-operator/pull/2103

ilyamochalov avatar Oct 24 '23 07:10 ilyamochalov

Unlike dotnet, I believe this is a fault of the docker image we supply, not the instrumentation itself.

@open-telemetry/operator-approvers I think we need to make a concrete decision on what auto-instrumentation images we supply. For all appropriate languages, will will supply both musl and glibc based images? Or is dotnet a one-off case because of how the dotnet agent is supplied?

TylerHelmuth avatar Oct 24 '23 15:10 TylerHelmuth

@TylerHelmuth thank you for checking this issue.

psutil_linux.abi3.so: __sched_cpufree: symbol not found and similar error messages indicate that psutil package (which is a dependency of Python OTel packages) was installed against a system with different C lib implementation (Glibc vs Musl). When pip installing psutil CPython compiles something against C lib. Pip dependencies compiled against Glibc won't work on Musl systems

Final autoinstrumentation images for .NET, Python, and other languages are simply one way to distribute programming language-specific auto-instr libraries. I think for languages which runtime depend on system C Lib we need to build auto-instr libraries against both Glibc and Musl libraries and bring both sets of artifacts to application. Then OTel Kubernetes operator should make a decision about what artifact needs to be injected into the app container.

ilyamochalov avatar Oct 25 '23 01:10 ilyamochalov

We discussed this issue during the SIG call today. We'd like to have a clean solution that auto-detects which libs to use and handles everything for the user, but we think finding a solution like that is unlikely.

Most likely we have to implement a dotnet-like solution where the user can specify the libs they need.

@srikanthccv do you or any other Python maintainers have any advice on how to handle this?

TylerHelmuth avatar Oct 26 '23 17:10 TylerHelmuth

I took a brief look at the dotnet solution. I think the same should work for Python as well. I will take some time to review the instrumentation side and see if there are any cases that require special handling.

srikanthccv avatar Oct 30 '23 18:10 srikanthccv

@srikanthccv thank you for taking a look. I will proceed with my PR proposing changes to operator and instr docker image (please review dockerfile on the PR link above)

ilyamochalov avatar Oct 31 '23 09:10 ilyamochalov

@open-telemetry/operator-approvers PR is ready, can someone review it please https://github.com/open-telemetry/opentelemetry-operator/pull/2266?

ilyamochalov avatar Nov 07 '23 05:11 ilyamochalov

Bumped into the psutil stacktrace issue while exploring python autoinstrumentation as defined by the files in the e2e-instrumentation/instrumentation-python directory.

Looks like the dockerfile for the default init container and for the test app (published at ghcr.io/open-telemetry/opentelemetry-operator/e2e-test-app-python:main) use binary incompatible base images -- one uses python3.11 (glibc) and the other alpine.318 (musl).

pmcollins avatar Jul 02 '24 18:07 pmcollins

Also, the collector configs defined in the instrumentation directories (e.g. tests/e2e-instrumentation/instrumentation-python/00-install-collector.yaml) don't specify a metrics receiver, but python auto-instrumentation sends metrics, so you get a 404 in the logs because of the failed metrics exports. Adding a metrics receiver to the collector pipeline solves the problem.

pmcollins avatar Jul 02 '24 20:07 pmcollins