community.general icon indicating copy to clipboard operation
community.general copied to clipboard

Opentelemetry stops sending traces with 9.0.0

Open moserke opened this issue 1 year ago • 12 comments

Summary

When going from 8.6.2 to 9.0.0 the opentelemetry callback stops sending traces to the endpoint. Same exact configuration and traces get forwarded in 8.6.2 but go nowhere in 9.0.0. I suspect it's due to how the exporter is getting picked but can't seem to figure out how to make it work.

otel_exporter = None
        if store_spans_in_file:
            otel_exporter = InMemorySpanExporter()
            processor = SimpleSpanProcessor(otel_exporter)
        else:
            if otel_exporter_otlp_traces_protocol == 'grpc':
                otel_exporter = GRPCOTLPSpanExporter()
            else:
                otel_exporter = HTTPOTLPSpanExporter()
            processor = BatchSpanProcessor(otel_exporter)

Issue Type

Bug Report

Component Name

opentelemetry callback

Ansible Version

$ ansible --version
ansible [core 2.17.1]
  config file = /ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.12/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 14.0.1 20240411 (Red Hat 14.0.1-0)] (/usr/bin/python3)
  jinja version = 3.1.4
  libyaml = True

Community.general Version

$ ansible-galaxy collection list community.general
Collection        Version
----------------- -------
community.general 9.1.0  

Configuration

$ ansible-config dump --only-changed

OS / Environment

No response

Steps to Reproduce

ansibile config: [defaults] callbacks_enabled = community.general.opentelemetry

Run playbook OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 ansible-playbook playbook.yml

Expected Results

Expect traces to be sent to endpoint

Actual Results

Traces are never forwarded

Code of Conduct

  • [X] I agree to follow the Ansible Code of Conduct

moserke avatar Jun 27 '24 21:06 moserke

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot avatar Jun 27 '24 21:06 ansibullbot

cc @v1v click here for bot help

ansibullbot avatar Jun 27 '24 21:06 ansibullbot

https://github.com/ansible-collections/community.general/pull/8321 is the PR that introduced the support for the http exporter.

As far as I see, the change uses the same exporter by default.

Can you try to run the plugin with the explicit configuration entries?

ansible.cfg:

    [defaults]
    callbacks_enabled = community.general.opentelemetry
    [callback_opentelemetry]
    otel_exporter_otlp_traces_protocol = grpc
    store_spans_in_file = None

IIUC, you tried locally running OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317/ ansible-playbook playbook.yml against your OTEL collector, right?

v1v avatar Jun 28 '24 06:06 v1v

Tried setting all of the possible config options to their defaults in the ansible.cfg and still the same issue, it just simply isn't trying to send the traces. If I do a store_spans_in_file=/dev/stdout instead just to see, it prints them to the screen, so I know it's tracing, it's just for some reason not sending to the otlp endpoint...

moserke avatar Jul 01 '24 13:07 moserke

Seeing the same issue here. Works nicely in 8.6, but silently stops sending traces in >=9.0.0.

rojon8 avatar Jul 12 '24 06:07 rojon8

I can see a few changes were added to v9.0:

  • https://github.com/ansible-collections/community.general/blob/stable-9/CHANGELOG.md#v9-0-0

IIUC, from the description, the issue might be related to supporting HTTP exporters and the existing GRPC support.

@wilfriedroset @russoz, since you worked and helped on https://github.com/ansible-collections/community.general/pull/8321, would you mind if I asked you to double-check if things work nicely on your end if you use >=9.0.0? 🙇

v1v avatar Jul 12 '24 10:07 v1v

tested with 9.2.0. problem persist

8.6.3 works ok

ansible [core 2.14.14] config file = /home/cervenka/.ansible.cfg configured module search path = ['/home/cervenka/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.9/site-packages/ansible ansible collection location = /home/cervenka/.ansible/collections:/usr/share/ansible/collections executable location = /usr/bin/ansible python version = 3.9.18 (main, Jan 4 2024, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/usr/bin/python3) jinja version = 3.1.2 libyaml = True

cervajs avatar Jul 16 '24 13:07 cervajs

I can see a few changes were added to v9.0:

IIUC, from the description, the issue might be related to supporting HTTP exporters and the existing GRPC support.

@wilfriedroset @russoz, since you worked and helped on #8321, would you mind if I asked you to double-check if things work nicely on your end if you use >=9.0.0? 🙇

Hi @v1v I pretty much helped review it from a Python/Ansible perspective, I am not familiar enough with OpenTelemetry to make a call on the plugin logic.

@wilfriedroset Would it be possible for you to double check the code change? TIA

russoz avatar Jul 17 '24 05:07 russoz

I have just reviewed the changes in that PR, and to the best of my ability I could not find anything that would be a problem. There are 4 other PRs after #8321 that might have introduced a problem (I have no x-ref' d them with the version tag, so probably not all of them apply).

russoz avatar Jul 17 '24 05:07 russoz

I've merged #8741, would be great if someone could verify that it fixes this bug.

felixfontein avatar Aug 12 '24 05:08 felixfontein

@felixfontein

with this version we only get the trace without any spans. if we use the community.general < 9.0.0 we have all the spans correctly reported.

OneCyrus avatar Sep 20 '24 06:09 OneCyrus

@wilfriedroset @v1v ^

felixfontein avatar Sep 20 '24 17:09 felixfontein

friendly push if someone has any pointer to the cause of this?

@wilfriedroset @v1v

OneCyrus avatar Dec 19 '24 07:12 OneCyrus

Sorry for the radio silence;

I cannot reproduce the missing traces/spans error with the latest changes in main.

How did I test this out?

I've been using the latest changes for the otel ansible plugin and testing against an OTEL Collector that has been configured with the Elastic exporter

OTEL collector config

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  otlp/elastic:
    endpoint: "${env:APM_URL}"
    headers:
      Authorization: "Bearer ${env:APM_TOKEN}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp/elastic]
    logs:
      receivers: [otlp]
      exporters: [otlp/elastic]

Then I ran docker compose with the below settings:

docker-compose.yml

---
services:

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    platform: linux/arm64
    volumes:
      - ./config/otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "1888:1888"   # pprof extension
      - "13133:13133" # health_check extension
      - "4317:4317"   # OTLP gRPC receiver
      - "55670:55679" # zpages extension
    environment:
      APM_URL: ${APM_URL}
      APM_TOKEN: ${APM_TOKEN}
    networks:
      - otel

volumes:
  otel:
    driver: local

networks:
  otel:

and ran:

$ OTEL_EXPORTER_OTLP_INSECURE=true \
	OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 \
	ansible-playbook playbook.yml

and so far so good in both cases OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 and OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317.

image

My current environment is:

Expand to view

ansible [core 2.16.6]
  config file = /Users/vmartinez/workspaces/v1v/its-ansible-otel/ansible.cfg
  configured module search path = ['/Users/vmartinez/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/lib/python3.12/site-packages/ansible
  ansible collection location = /Users/vmartinez/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/bin/ansible
  python version = 3.12.8 (main, Dec  3 2024, 18:42:41) [Clang 16.0.0 (clang-1600.0.26.4)] (/Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/bin/python)
  jinja version = 3.1.4
  libyaml = True
Package                                  Version
---------------------------------------- --------
ansible                                  9.5.1
ansible-core                             2.16.6
certifi                                  2024.2.2
cffi                                     1.16.0
charset-normalizer                       3.3.2
cryptography                             42.0.7
Deprecated                               1.2.14
docker                                   7.0.0
googleapis-common-protos                 1.63.0
grpcio                                   1.63.0
idna                                     3.7
importlib-metadata                       7.0.0
iniconfig                                2.0.0
Jinja2                                   3.1.4
MarkupSafe                               2.1.5
opentelemetry-api                        1.24.0
opentelemetry-exporter-otlp              1.24.0
opentelemetry-exporter-otlp-proto-common 1.24.0
opentelemetry-exporter-otlp-proto-grpc   1.24.0
opentelemetry-exporter-otlp-proto-http   1.24.0
opentelemetry-proto                      1.24.0
opentelemetry-sdk                        1.24.0
opentelemetry-semantic-conventions       0.45b0
packaging                                24.0
pip                                      24.0
pluggy                                   1.5.0
protobuf                                 4.25.3
pycparser                                2.22
pytest                                   8.2.0
PyYAML                                   6.0.1
requests                                 2.31.0
resolvelib                               1.0.1
typing_extensions                        4.11.0
urllib3                                  2.2.1
wrapt                                    1.16.0
zipp                                     3.18.1
                "org.opencontainers.image.source": "https://github.com/open-telemetry/opentelemetry-collector-releases",
                "org.opencontainers.image.version": "0.100.0"

If I update those dependencies, it works too:

Expand to view

ansible [core 2.18.1]
  config file = /Users/vmartinez/workspaces/v1v/its-ansible-otel/ansible.cfg
  configured module search path = ['/Users/vmartinez/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/lib/python3.12/site-packages/ansible
  ansible collection location = /Users/vmartinez/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/bin/ansible
  python version = 3.12.8 (main, Dec  3 2024, 18:42:41) [Clang 16.0.0 (clang-1600.0.26.4)] (/Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/bin/python)
  jinja version = 3.1.5
  libyaml = True
Package                                  Version
---------------------------------------- ----------
ansible                                  11.1.0
ansible-core                             2.18.1
certifi                                  2024.12.14
cffi                                     1.17.1
charset-normalizer                       3.4.1
cryptography                             44.0.0
Deprecated                               1.2.15
googleapis-common-protos                 1.66.0
grpcio                                   1.68.1
idna                                     3.10
importlib_metadata                       8.5.0
iniconfig                                2.0.0
Jinja2                                   3.1.5
MarkupSafe                               3.0.2
opentelemetry-api                        1.29.0
opentelemetry-exporter-otlp              1.29.0
opentelemetry-exporter-otlp-proto-common 1.29.0
opentelemetry-exporter-otlp-proto-grpc   1.29.0
opentelemetry-exporter-otlp-proto-http   1.29.0
opentelemetry-proto                      1.29.0
opentelemetry-sdk                        1.29.0
opentelemetry-semantic-conventions       0.50b0
packaging                                24.2
pip                                      24.3.1
pluggy                                   1.5.0
protobuf                                 5.29.2
pycparser                                2.22
pytest                                   8.3.4
PyYAML                                   6.0.2
requests                                 2.32.3
resolvelib                               1.0.1
typing_extensions                        4.12.2
urllib3                                  2.3.0
wrapt                                    1.17.0
zipp                                     3.21.0

If you'd like to reuse what I've done, https://github.com/v1v/otel-ansible-callback-plugin/pull/2 might help you - you can configure another OTEL vendor.

Please let me know if you can provide what vendors you can see it's not working

v1v avatar Dec 31 '24 12:12 v1v

However, if I use the latest container (0.116.1) for the OTEL Collector :

"org.opencontainers.image.created": "2024-12-17T21:09:34Z",
"org.opencontainers.image.licenses": "Apache-2.0",
"org.opencontainers.image.name": "opentelemetry-collector-releases",
"org.opencontainers.image.revision": "62dfc10402322ae4e2cdbdd92a0c0cc797f1b1f4",
"org.opencontainers.image.source": "https://github.com/open-telemetry/opentelemetry-collector-releases",
"org.opencontainers.image.version": "0.116.1"

Then the same setup it's not working:

Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 8s.

Regardless, https://github.com/ansible-collections/community.general/blob/main/plugins/callback/opentelemetry.py works fine if I use OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS without the OTEL Collector itself:

OTEL_EXPORTER_OTLP_INSECURE=true \
	OTEL_EXPORTER_OTLP_ENDPOINT=https://*****.elastic-cloud.com:443 \
	OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer *****" \
	ansible-playbook playbook.yml
[...]

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
image

v1v avatar Dec 31 '24 12:12 v1v

We can close this issue. So far I have not been able to reproduce the issue after the fix done at https://github.com/ansible-collections/community.general/issues/8566#issuecomment-2283148803

v1v avatar Jan 07 '25 21:01 v1v

@moserke any objection to that?

russoz avatar Jan 07 '25 21:01 russoz

needs_info

russoz avatar Jan 08 '25 08:01 russoz

Thanks @russoz. Sounds good to me. My apologies for missing all of these.

moserke avatar Jan 08 '25 16:01 moserke

it started to work again with one of latest versions. looks good here too.

OneCyrus avatar Jan 08 '25 16:01 OneCyrus

@v1v since you're a maintainer for this plugin you can write close_me in a comment to make the bot close the issue. (https://github.com/ansible/ansibullbot/blob/devel/ISSUE_HELP.md#commands) (I won't close it now so you can try it out ;-) )

felixfontein avatar Jan 08 '25 19:01 felixfontein

close_me

v1v avatar Jan 08 '25 20:01 v1v