apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

monitoring: enable tracing self instrumentation in APM Server

Open 1pkg opened this issue 1 year ago • 5 comments

Motivation/summary

This PR addresses the issue https://github.com/elastic/apm-server/issues/14230 and enables self instrumentation in APM Server.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

For the detailed steps to test this changes in self-hosted APM Server refer to https://github.com/elastic/apm-server/pull/13514#issuecomment-2205243038

Related issues

https://github.com/elastic/apm-server/issues/14230

1pkg avatar Oct 02 '24 00:10 1pkg

This pull request does not have a backport label. Could you fix it @1pkg? 🙏 To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.

mergify[bot] avatar Oct 02 '24 00:10 mergify[bot]

backport-8.x has been added to help with the transition to the new branch 8.x. If you don't need it please use backport-skip label.

mergify[bot] avatar Oct 02 '24 00:10 mergify[bot]

While testing the changes manually, I discovered a bug in EA tracing sampling changes where sampling_rate is formatted as a binary (hex) with exponent instead of decimal literal without exponent, this breaks subsequent call in apm-agent-go when tracer is initialized with a custom sampling rate.

image

1pkg avatar Oct 05 '24 00:10 1pkg

Raised an issue in the agent repo https://github.com/elastic/elastic-agent/issues/5711

1pkg avatar Oct 05 '24 00:10 1pkg

Can't be merged yet due to the Cloud configuration missing sensible default for sampling_rate.

1pkg avatar Oct 14 '24 16:10 1pkg

This pull request is now in conflicts. Could you fix it @1pkg? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b enable-self-instrumentation-tracing upstream/enable-self-instrumentation-tracing
git merge upstream/main
git push upstream enable-self-instrumentation-tracing

mergify[bot] avatar Oct 21 '24 08:10 mergify[bot]

Testing on ESS

Tested successfully with 8.16 BC2, can visualize traces (screenshots attached) Screenshot 2024-10-29 at 10 27 37 PM Screenshot 2024-10-29 at 10 36 18 PM

Testing on Standalone

Tested successfully with commit SHA ae423f0e4e22643f9c02b489a4fdaeca76bcf26e

Ran the elastic-agent locally with the following docker-compose and elastic-agent.yml:

version: '3.9'
services:
  elastic-agent:
    image: elastic-agent-systemtest:8.16.0-0e756d08-SNAPSHOT
    networks:
      - network1
    ports:
      - 8220:8220
      - 8200:8200
    user: root
    environment:
      FLEET_SERVER_ENABLE: 'true'
      FLEET_SERVER_ELASTICSEARCH_HOST: http://elasticsearch:9200
      FLEET_SERVER_SERVICE_TOKEN: AAEAAWVsYXN0aWMvZmxlZXQtc2VydmVyL3Rva2VuLTE3MzAyNDE5NTk0OTA6bFo5TnNWMlFST3VPQmp1Q1MxVC1kQQ
      FLEET_SERVER_POLICY_ID: fleet-server-policy
      FLEET_SERVER_PORT: 8220
networks:
  network1:
    name: apm-server_default
    external: true
fleet:
  enabled: true

agent.monitoring:
  # enabled turns on monitoring of running processes
  enabled: true
  traces: true
  apm:
    environment: "816bctesting"
    global_labels:
      foo: localbctesting
    hosts:
      - https://88e3f5<REDACTED>.elastic-cloud.com:443 # this is a personal ESS cluster where I am sending monitoring data
    secret_token: <REDACTED>
    sampling_rate: 1

outputs:
  default:
    type: elasticsearch
    hosts:
      - 'http://elasticsearch:9200'
    username: admin
    password: changeme

Observed traces on 88e3f5 (configured monitoring cluster):

Screenshot 2024-10-29 at 11 16 57 PM Screenshot 2024-10-29 at 11 17 33 PM

lahsivjar avatar Oct 29 '24 23:10 lahsivjar