apm-agent-python
apm-agent-python copied to clipboard
Starlette Application Stops When ElasticAPM Server is Unreachable
Describe the bug: When integrating ElasticAPM middleware with a Starlette application, the application ceases to function if the ElasticAPM server is unreachable. This occurs under various circumstances, such as an incorrect URL, proxy issues, or certificate verification failures.
To Reproduce
- Set up a Starlette application with ElasticAPM middleware.
- Configure the ElasticAPM middleware to point to a non-existent APM server URL (to simulate server downtime).
- Start the Starlette application.
- Observe logging output indicating that the APM server cannot be reached.
- After approximately 30 seconds, the application stops.
Environment (please complete the following information)
- OS: Linux/Windows
- Python version: 3.11
- Framework and version: Starlette/FastAPI
- APM Server version:
- Agent version:
Additional context
Add any other context about the problem here.
This issue poses a significant risk to application reliability, as any downtime or configuration issue with the ElasticAPM server directly affects the availability of all services using the Starlette framework with ElasticAPM middleware.
- Agent config options
Click to expand
apm = _make_apm_client( { "SERVICE_NAME": "XX", "ENVIRONMENT": "DEV", "SERVER_URL": "https://...", "SERVER_CERT": "path_to_cer", "VERIFY_SERVER_CERT": True, } ) app.add_middleware(_ElasticAPM, client=apm) requirements.txt:Click to expand
fastapi>=0.108.0 elastic-apm
Thanks for reporting
The link on the title doesn't work.
- This is the one meant: https://github.com/encode/starlette/discussions/2571
@Kludex thanks!
@Impro02 I'm trying to replicate it with the example you provided on the starlette discussion with the following packages:
anyio==4.3.0
certifi==2024.2.2
click==8.1.7
ecs-logging==2.1.0
elastic-apm==6.22.0
h11==0.14.0
idna==3.7
sniffio==1.3.1
starlette==0.37.2
urllib3==2.2.1
uvicorn==0.29.0
wrapt==1.14.1
With both python 3.10.12 and 3.11.9 the app is still responding to the /status endpoint after minutes.
@xrmx After more investigation, the issue seems to be triggered by psutil lib. From my tests, with the following environment, it will crash after psutil installation.
I use python 3.12.1
anyio==4.3.0
certifi==2024.2.2
click==8.1.7
colorama==0.4.6
ecs-logging==2.1.0
elastic-apm==6.22.0
h11==0.14.0
idna==3.7
psutil==5.9.8
setuptools==68.2.2
sniffio==1.3.1
starlette==0.37.2
urllib3==2.2.1
uvicorn==0.29.0
wheel==0.41.2
wrapt==1.14.1
Ok, so this looks more like metrics related.
Still cannot reproduce with python 3.12.3. Have you tried to update python version?
Using python 3.12.3 seems to fix the issue.
Closing then.