opentelemetry-operator
opentelemetry-operator copied to clipboard
Python autoinstrumentation for musl libc based application containers
Component(s)
instrumentation
Is your feature request related to a problem? Please describe.
Python autoinstrumentation for musl libc based application containers fails with the following error:
#16 2.190 ImportError: Error relocating /autoinstrumentation/psutil/_psutil_linux.abi3.so: __sched_cpufree: symbol not found
#16 2.191 Failed to auto initialize opentelemetry
#16 2.191 Traceback (most recent call last):
#16 2.191 File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
#16 2.191 _load_instrumentors(distro)
#16 2.191 File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
#16 2.191 raise exc
#16 2.191 File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
#16 2.191 distro.load_instrumentor(entry_point, skip_dep_check=True)
#16 2.191 File "/autoinstrumentation/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
#16 2.191 instrumentor: BaseInstrumentor = entry_point.load()
#16 2.191 File "/autoinstrumentation/pkg_resources/__init__.py", line 2518, in load
#16 2.191 return self.resolve()
#16 2.191 File "/autoinstrumentation/pkg_resources/__init__.py", line 2524, in resolve
#16 2.191 module = __import__(self.module_name, fromlist=['__name__'], level=0)
#16 2.191 File "/autoinstrumentation/opentelemetry/instrumentation/system_metrics/__init__.py", line 79, in <module>
#16 2.191 import psutil
#16 2.191 File "/autoinstrumentation/psutil/__init__.py", line 102, in <module>
#16 2.191 from . import _pslinux as _psplatform
#16 2.191 File "/autoinstrumentation/psutil/_pslinux.py", line 25, in <module>
#16 2.191 from . import _psutil_linux as cext
#16 2.191 ImportError: Error relocating /autoinstrumentation/psutil/_psutil_linux.abi3.so: __sched_cpufree: symbol not found
Root cause: current autoinstrumentation build packaged for BSD libc.
Describe the solution you'd like
- Add an extra build stage to alpine base image at https://github.com/open-telemetry/opentelemetry-operator/blob/v0.87.0/autoinstrumentation/python/Dockerfile#L12
- Copy instrumentation library into final image into a separate path: https://github.com/open-telemetry/opentelemetry-operator/blob/v0.87.0/autoinstrumentation/python/Dockerfile#L22
- Add extra annotation instrumentation.opentelemetry.io/otel-python-auto-runtime: "linux-musl-x64""
- Update https://github.com/open-telemetry/opentelemetry-operator/blob/main/pkg/instrumentation/python.go to facilitate changes need to load copy and load correct dependencies
Describe alternatives you've considered
No response
Additional context
Similar change was made for .Net
- https://github.com/open-telemetry/opentelemetry-operator/pull/2087
- https://github.com/open-telemetry/opentelemetry-operator/pull/2103
Unlike dotnet, I believe this is a fault of the docker image we supply, not the instrumentation itself.
@open-telemetry/operator-approvers I think we need to make a concrete decision on what auto-instrumentation images we supply. For all appropriate languages, will will supply both musl and glibc based images? Or is dotnet a one-off case because of how the dotnet agent is supplied?
@TylerHelmuth thank you for checking this issue.
psutil_linux.abi3.so: __sched_cpufree: symbol not found and similar error messages indicate that psutil package (which is a dependency of Python OTel packages) was installed against a system with different C lib implementation (Glibc vs Musl). When pip installing psutil CPython compiles something against C lib. Pip dependencies compiled against Glibc won't work on Musl systems
Final autoinstrumentation images for .NET, Python, and other languages are simply one way to distribute programming language-specific auto-instr libraries. I think for languages which runtime depend on system C Lib we need to build auto-instr libraries against both Glibc and Musl libraries and bring both sets of artifacts to application. Then OTel Kubernetes operator should make a decision about what artifact needs to be injected into the app container.
We discussed this issue during the SIG call today. We'd like to have a clean solution that auto-detects which libs to use and handles everything for the user, but we think finding a solution like that is unlikely.
Most likely we have to implement a dotnet-like solution where the user can specify the libs they need.
@srikanthccv do you or any other Python maintainers have any advice on how to handle this?
I took a brief look at the dotnet solution. I think the same should work for Python as well. I will take some time to review the instrumentation side and see if there are any cases that require special handling.
@srikanthccv thank you for taking a look. I will proceed with my PR proposing changes to operator and instr docker image (please review dockerfile on the PR link above)
@open-telemetry/operator-approvers PR is ready, can someone review it please https://github.com/open-telemetry/opentelemetry-operator/pull/2266?
Bumped into the psutil stacktrace issue while exploring python autoinstrumentation as defined by the files in the e2e-instrumentation/instrumentation-python directory.
Looks like the dockerfile for the default init container and for the test app (published at ghcr.io/open-telemetry/opentelemetry-operator/e2e-test-app-python:main) use binary incompatible base images -- one uses python3.11 (glibc) and the other alpine.318 (musl).
Also, the collector configs defined in the instrumentation directories (e.g. tests/e2e-instrumentation/instrumentation-python/00-install-collector.yaml) don't specify a metrics receiver, but python auto-instrumentation sends metrics, so you get a 404 in the logs because of the failed metrics exports. Adding a metrics receiver to the collector pipeline solves the problem.