opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

[auto-instrumentation-python]Having issue with auto-instrumentation python of OpenTelemetry operator

Open xwgao opened this issue 2 years ago • 6 comments

Component(s)

instrumentation

What happened?

Description

I installed Community OpenTelemetry Operator 0.89.0 in my OpenShift 4.12.22 cluster. I created an OpenTelemetry instrumentation and collector in my namespace. And add annotation instrumentation.opentelemetry.io/inject-python: "instrumentation" in my deployment (which uses Python technology) in the namespace. Then after the pod restarted, I found the error messages below from the pod log.

 from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
Failed to auto initialize opentelemetry
Traceback (most recent call last):
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
_load_instrumentors(distro)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
raise exc
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
distro.load_instrumentor(entry_point, skip_dep_check=True)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
instrumentor: BaseInstrumentor = entry_point.load()
File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2518, in load
return self.resolve()
File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2524, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/sqlite3/__init__.py", line 42, in <module>
import sqlite3
File "/usr/local/lib/python3.9/sqlite3/__init__.py", line 57, in <module>
from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
...
Failed to export batch code: 404, reason: 404 page not found
...

Steps to Reproduce

  1. Install Community OpenTelemetry Operator 0.89.0 in OpenShift 4.12.22 cluster.
  2. In my namespace, create an OpenTelemetry instrumentation as below.
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  annotations:
    instrumentation.opentelemetry.io/default-auto-instrumentation-apache-httpd-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
    instrumentation.opentelemetry.io/default-auto-instrumentation-dotnet-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.1.0
    instrumentation.opentelemetry.io/default-auto-instrumentation-go-image: >-
      ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.8.0-alpha
    instrumentation.opentelemetry.io/default-auto-instrumentation-java-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.31.0
    instrumentation.opentelemetry.io/default-auto-instrumentation-nginx-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
    instrumentation.opentelemetry.io/default-auto-instrumentation-nodejs-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.44.0
    instrumentation.opentelemetry.io/default-auto-instrumentation-python-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.41b0
  name: instrumentation
  namespace: my-namespace
  labels:
    app.kubernetes.io/managed-by: opentelemetry-operator
spec:
  exporter:
    endpoint: 'http://otel-collector-headless:4317'
  java:
    env:
      - name: OTEL_INSTRUMENTATION_LIBERTY_ENABLED
        value: 'true'
      - name: OTEL_METRICS_EXPORTER
        value: none
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.31.0
    resources:
      limits:
        cpu: 500m
        memory: 64Mi
      requests:
        cpu: 50m
        memory: 64Mi
  sampler:
    argument: '1'
    type: parentbased_traceidratio
  go:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.8.0-alpha
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 32Mi
      requests:
        cpu: 50m
        memory: 32Mi
  nodejs:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.44.0
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 50m
        memory: 128Mi
  resource: {}
  apacheHttpd:
    configPath: /usr/local/apache2/conf
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 1m
        memory: 128Mi
    version: '2.4'
  propagators:
    - tracecontext
    - baggage
    - b3
  dotnet:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.1.0
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 50m
        memory: 128Mi
  nginx:
    configFile: /etc/nginx/nginx.conf
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 1m
        memory: 128Mi
  python:
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: 'http://otel-collector-headless:4318'
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.41b0
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 32Mi
      requests:
        cpu: 50m
        memory: 32Mi
  1. Create an OpenTelemetry collector as below.
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  labels:
    app.kubernetes.io/managed-by: opentelemetry-operator
  name: otel
  namespace: my-namespace
spec:
  observability:
    metrics: {}
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:

    processors:
      batch:
        timeout: 10s
        send_batch_size: 10000
      metricstransform:
        transforms:
          - include: my-test.duration
            match_type: regexp
            action: update
            operations:
              - action: update_label
                label: http.url
                new_label: url
              - action: update_label
                label: http.method
                new_label: method
              - action: update_label
                label: http.status_code
                new_label: code

    exporters:
      logging:
        verbosity: detailed
      prometheus:
        endpoint: "0.0.0.0:8889"
        send_timestamps: true
        metric_expiration: 1440m

    connectors:
      spanmetrics:
        namespace: my-test
        histogram:
          unit: s
          explicit:
            buckets: [10ms, 100ms, 200ms, 400ms, 800ms, 1s, 1200ms, 1400ms, 1600ms, 1800ms, 2s, 4s, 6s, 8s, 10s]
        dimensions:
          - name: http.method
          - name: http.status_code
          - name: http.url
          - name: http.route
          - name: http.host

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [spanmetrics, logging]
        metrics:
          receivers: [spanmetrics]
          processors: [batch, metricstransform]
          exporters: [prometheus, logging]
  mode: statefulset
  resources: {}
  managementState: managed
  upgradeStrategy: automatic
  ingress:
    route: {}
  targetAllocator:
    prometheusCR:
      scrapeInterval: 30s
    resources: {}
  image: >-
    ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.89.0
  replicas: 1
  updateStrategy: {}
  podDisruptionBudget:
    maxUnavailable: 1
  1. Add below annotation into my deployment (which uses Python technology) in the same namespace. Save the changes, then the pod restarted.
        instrumentation.opentelemetry.io/inject-python: instrumentation
  1. After the pod restarted, I found that error messages below from the pod log.
 from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
Failed to auto initialize opentelemetry
Traceback (most recent call last):
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
_load_instrumentors(distro)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
raise exc
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
distro.load_instrumentor(entry_point, skip_dep_check=True)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
instrumentor: BaseInstrumentor = entry_point.load()
File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2518, in load
return self.resolve()
File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2524, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/sqlite3/__init__.py", line 42, in <module>
import sqlite3
File "/usr/local/lib/python3.9/sqlite3/__init__.py", line 57, in <module>
from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
...
Failed to export batch code: 404, reason: 404 page not found
...

Expected Result

The Python auto-instrumentation works well in my pod container.

Actual Result

The Python auto-instrumentation failed to be auto initialized for '_sqlite3' module not found error.

Kubernetes Version

v1.25.10+8c21020

Operator version

0.89.0

Collector version

0.89.0

Environment information

Environment

Platform: OpenShift 4.12.22 cluster Python3 version: Python 3.9.16

Log output

Instrumenting of sqlite3 failed
Traceback (most recent call last):
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
    distro.load_instrumentor(entry_point, skip_dep_check=True)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
    instrumentor: BaseInstrumentor = entry_point.load()
  File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2518, in load
    return self.resolve()
  File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2524, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/sqlite3/__init__.py", line 42, in <module>
    import sqlite3
  File "/usr/local/lib/python3.9/sqlite3/__init__.py", line 57, in <module>
    from sqlite3.dbapi2 import *
  File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
    from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
Failed to auto initialize opentelemetry
Traceback (most recent call last):
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
    _load_instrumentors(distro)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
    raise exc
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
    distro.load_instrumentor(entry_point, skip_dep_check=True)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
    instrumentor: BaseInstrumentor = entry_point.load()
  File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2518, in load
    return self.resolve()
  File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2524, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/sqlite3/__init__.py", line 42, in <module>
    import sqlite3
  File "/usr/local/lib/python3.9/sqlite3/__init__.py", line 57, in <module>
    from sqlite3.dbapi2 import *
  File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
    from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
2023-12-01 07:27:34,096 INFO Included extra file "/etc/supervisor/conf.d/coreidp-login.conf" during parsing
2023-12-01 07:27:34,096 INFO Included extra file "/etc/supervisor/conf.d/filebeat.conf" during parsing
2023-12-01 07:27:34,100 INFO RPC interface 'supervisor' initialized
2023-12-01 07:27:34,100 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2023-12-01 07:27:34,100 INFO supervisord started with pid 1
2023-12-01 07:27:35,104 INFO spawned: 'coreidp-login' with pid 15
2023-12-01 07:27:35,107 INFO spawned: 'filebeat' with pid 18
{"log.level":"warn","@timestamp":"2023-12-01T07:27:35.546Z","log.origin":{"file.name":"beater/filebeat.go","file.line":175},"message":"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning.","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-12-01T07:27:35.547Z","log.origin":{"file.name":"beater/filebeat.go","file.line":307},"message":"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning.","service.name":"filebeat","ecs.version":"1.6.0"}
yarn run v1.22.19
warning Skipping preferred cache folder "/.cache/yarn" because it is not writable.
warning Selected the next writable cache folder in the list, will be "/tmp/.yarn-cache-1000910000".
$ cross-env NODE_ENV=production ROOT_PATH=$npm_package_config_root_path nodemon ./app.js -w server -w config
warning Cannot find a suitable global folder. Tried these: "/usr/local, /.yarn"
[33m[nodemon] 2.0.22[39m
[33m[nodemon] to restart at any time, enter `rs`[39m
[33m[nodemon] watching path(s): server/**/* config[39m
[33m[nodemon] watching extensions: js,mjs,json[39m
[32m[nodemon] starting `node ./app.js`[39m
Express server listening on port http://localhost:3003
Express server listening on secured port https://localhost:9443
2023-12-01 07:27:45,572 INFO success: filebeat entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
Failed to export batch code: 404, reason: 404 page not found

2023-12-01 07:28:45,648 INFO success: coreidp-login entered RUNNING state, process has stayed up for > than 70 seconds (startsecs)
Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

(node:63) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
(Use `node --trace-warnings ...` to show where the warning was created)
Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Additional context

No response

xwgao avatar Dec 01 '23 07:12 xwgao

These types of issues with python auto instrumentation and the operator are almost always an issue with your apps packages not being compatible with Pythons auto instrumentation.

Some things to try:

  1. Upgrade your python packages/python version.
  2. Instead of using the operator to do the injection add the python auto-instrumentation to the app yourself. This will confirm it is a python thing and not the operator.

TylerHelmuth avatar Dec 01 '23 08:12 TylerHelmuth

@TylerHelmuth Is there any way to ignore or bypass this error using the operator? Thanks.

xwgao avatar Dec 04 '23 08:12 xwgao

@TylerHelmuth I added the env var OTEL_PYTHON_DISABLED_INSTRUMENTATIONS (value: sqlite3) into my deployment, then after the pod restarted, the error was gone. But I still can not find any trace (span) collected for my Python service. Any idea about this? Thanks.

xwgao avatar Dec 04 '23 09:12 xwgao

I added the env var OTEL_PYTHON_DISABLED_INSTRUMENTATIONS (value: sqlite3) into my deployment,

Note that the env var can be as well added to the env field of the instrumentation CR.

I am not sure why your app is not producing any data.

pavolloffay avatar Dec 04 '23 12:12 pavolloffay

I resolved the sqlite3 error and opened another github issue https://github.com/open-telemetry/opentelemetry-python/issues/3573. Can any one help on this? Thanks a lot.

xwgao avatar Dec 13 '23 09:12 xwgao

I resolved the sqlite3 error and opened another github issue open-telemetry/opentelemetry-python#3573. Can any one help on this? Thanks a lot.

How did you resolve this issue? For us disabling instrumentation for sqlite3 only helped in removal of the error , but the log instrumentation still doesn't work, here is my instrumentation crd spec

spec:
  apacheHttpd:
    configPath: /usr/local/apache2/conf
    version: '2.4'
  dotnet:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.5.0
  env:
    - name: OTEL_EXPORTER_OTLP_TIMEOUT
      value: '200'
    - name: OTEL_LOGS_EXPORTER
      value: otlp_proto_http
    - name: OTEL_EXPORTER_OTLP_HTTP_LOGS_ENDPOINT
      value: >-
        http://obs-gateway-collector.test.svc.cluster.local:4318/v1/logs
  exporter:
    endpoint: >-
      http://obs-python-collector.test.svc.cluster.local:4318
  java:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.31.0
  nodejs:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.34.0
  python:
    env:
      - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
        value: 'true'
      - name: OTEL_PYTHON_LOG_LEVEL
        value: debug
    image: >-
          ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.44b0
  resource: {}
  sampler:
    type: always_on

surabhi28 avatar May 08 '24 06:05 surabhi28