apm-agent-python icon indicating copy to clipboard operation
apm-agent-python copied to clipboard

Starlette Application Stops When ElasticAPM Server is Unreachable

Open Impro02 opened this issue 1 year ago • 5 comments

Describe the bug: When integrating ElasticAPM middleware with a Starlette application, the application ceases to function if the ElasticAPM server is unreachable. This occurs under various circumstances, such as an incorrect URL, proxy issues, or certificate verification failures.

To Reproduce

  • Set up a Starlette application with ElasticAPM middleware.
  • Configure the ElasticAPM middleware to point to a non-existent APM server URL (to simulate server downtime).
  • Start the Starlette application.
  • Observe logging output indicating that the APM server cannot be reached.
  • After approximately 30 seconds, the application stops.

Environment (please complete the following information)

  • OS: Linux/Windows
  • Python version: 3.11
  • Framework and version: Starlette/FastAPI
  • APM Server version:
  • Agent version:

Additional context

Add any other context about the problem here.

This issue poses a significant risk to application reliability, as any downtime or configuration issue with the ElasticAPM server directly affects the availability of all services using the Starlette framework with ElasticAPM middleware.

  • Agent config options
    Click to expand apm = _make_apm_client( { "SERVICE_NAME": "XX", "ENVIRONMENT": "DEV", "SERVER_URL": "https://...", "SERVER_CERT": "path_to_cer", "VERIFY_SERVER_CERT": True, } ) app.add_middleware(_ElasticAPM, client=apm)
  • requirements.txt:
    Click to expand fastapi>=0.108.0 elastic-apm

Impro02 avatar Apr 25 '24 09:04 Impro02

Thanks for reporting

xrmx avatar Apr 26 '24 07:04 xrmx

The link on the title doesn't work.

  • This is the one meant: https://github.com/encode/starlette/discussions/2571

Kludex avatar Apr 26 '24 07:04 Kludex

@Kludex thanks!

xrmx avatar Apr 26 '24 07:04 xrmx

@Impro02 I'm trying to replicate it with the example you provided on the starlette discussion with the following packages:

anyio==4.3.0
certifi==2024.2.2
click==8.1.7
ecs-logging==2.1.0
elastic-apm==6.22.0
h11==0.14.0
idna==3.7
sniffio==1.3.1
starlette==0.37.2
urllib3==2.2.1
uvicorn==0.29.0
wrapt==1.14.1

With both python 3.10.12 and 3.11.9 the app is still responding to the /status endpoint after minutes.

xrmx avatar Apr 30 '24 10:04 xrmx

@xrmx After more investigation, the issue seems to be triggered by psutil lib. From my tests, with the following environment, it will crash after psutil installation.

I use python 3.12.1

anyio==4.3.0
certifi==2024.2.2
click==8.1.7
colorama==0.4.6
ecs-logging==2.1.0
elastic-apm==6.22.0
h11==0.14.0
idna==3.7
psutil==5.9.8
setuptools==68.2.2
sniffio==1.3.1
starlette==0.37.2
urllib3==2.2.1
uvicorn==0.29.0
wheel==0.41.2
wrapt==1.14.1

Impro02 avatar May 02 '24 08:05 Impro02

Ok, so this looks more like metrics related.

Still cannot reproduce with python 3.12.3. Have you tried to update python version?

xrmx avatar May 02 '24 10:05 xrmx

Using python 3.12.3 seems to fix the issue.

Impro02 avatar May 02 '24 12:05 Impro02

Closing then.

xrmx avatar May 02 '24 14:05 xrmx