High CPU usage on startup
Summary of problem
During start of our applications CPU usage is going 3x from usual utilization, after few mins it returns back to normal state, but it does affect our autoscaling.
When we are starting app without ddtrace-run command CPU usage spikes on start does not appear.
Python version: 3.12
Which version of dd-trace-py are you using?
2.8.1
Which version of pip are you using?
24.0
Which libraries and their versions are you using?
`pip freeze`
aiohttp==3.9.3
aiosignal==1.3.1
amqp==5.2.0
annotated-types==0.6.0
anyio==4.3.0
asgiref==3.7.2
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
billiard==4.2.0
boto3==1.34.71
botocore==1.34.71
bytecode==0.15.1
cachetools==5.3.3
cattrs==23.2.3
celery==5.3.6
certifi==2024.2.2
cffi==1.16.0
channels==4.0.0
channels-demultiplexer==2.2.0
channels-redis==4.1.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
clickhouse-driver==0.2.7
cron-descriptor==1.4.3
cryptography==42.0.5
curlify==2.2.1
ddsketch==3.0.1
ddtrace==2.8.1
defusedxml==0.7.1
Deprecated==1.2.14
dicttoxml==1.7.16
Django==5.0.3
django-admin-autocomplete-filter==0.7.1
django-annoying==0.10.6
django-appconf==1.0.6
django-celery-beat==2.6.0
django-celery-results==2.3.1
django-cors-headers==4.3.1
django-debug-toolbar==4.3.0
django-deprecate-fields==0.1.2
django-dirtyfields==1.9.2
django-elasticsearch-dsl==7.4
django-elasticsearch-dsl-drf==0.22.5
django-environ==0.11.2
django-extensions==3.2.3
django-filter==24.1
django-fsm==2.8.1
django-health-check==3.18.1
django-model-utils==4.4.0
django-nine==0.2.7
django-redis==5.4.0
django-redis-sessions==0.6.2
django-redis-sessions-fork==0.7.0
django-split-settings==1.3.0
django-storages==1.14.2
django-timezone-field==6.1.0
djangorestframework==3.14.0
djangorestframework-simplejwt==5.3.1
drf-extensions==0.7.1
drf-spectacular==0.27.1
elasticsearch==7.13.4
elasticsearch-dsl==7.4.1
envier==0.5.1
facebook_business==19.0.2
factory-boy==3.3.0
Faker==23.3.0
fakeredis==2.21.3
frozenlist==1.4.1
ga4mp==2.0.4
google-auth==2.29.0
google-measurement-protocol==1.1.0
growthbook==1.0.0
gunicorn==21.2.0
h11==0.14.0
hiredis==2.3.2
httptools==0.6.1
hurry.filesize==0.9
idna==3.6
importlib-metadata==6.11.0
inflection==0.5.1
jmespath==0.10.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kombu==5.3.5
lupa==2.0
maxminddb==2.6.0
maxminddb-geolite2==2018.703
mongoengine==0.27.0
mongomock==4.1.2
msgpack==1.0.8
msgspec==0.18.6
multidict==6.0.5
oauthlib==3.2.2
opentelemetry-api==1.23.0
packaging==24.0
paypalrestsdk==1.13.3
phonenumbers==8.13.33
pillow==10.2.0
prices==1.1.1
prompt-toolkit==3.0.43
protobuf==5.26.0
psutil==5.9.8
psycopg==3.1.18
psycopg-binary==3.1.18
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycountry==23.12.11
pycparser==2.21
pycryptodome==3.20.0
pydantic==2.6.4
pydantic_core==2.16.3
PyJWT==2.8.0
pymongo==3.13.0
pyOpenSSL==24.1.0
pyotp==2.9.0
python-crontab==3.0.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-http-client==3.3.7
python-json-logger==2.0.7
python3-openid==3.2.0
pytz==2024.1
PyYAML==6.0.1
redis==5.0.3
referencing==0.34.0
reportlab==4.1.0
requests==2.31.0
requests-oauthlib==1.4.0
rpds-py==0.18.0
rsa==4.9
s3transfer==0.10.1
sendgrid==6.11.0
sentinels==1.0.0
sentry-sdk==1.43.0
setuptools==69.2.0
six==1.16.0
sniffio==1.3.1
social-auth-app-django==5.4.0
social-auth-core==4.5.3
solidgate-card-sdk==0.1.0
sortedcontainers==2.4.0
soupsieve==2.5
sqlparse==0.4.4
starkbank-ecdsa==2.2.0
stripe==8.8.0
twilio==7.17.0
typing_extensions==4.10.0
tzdata==2024.1
tzlocal==5.2
ua-parser==0.18.0
ujson==5.9.0
uritemplate==4.1.1
urllib3==1.26.18
user-agents==2.2.0
uvicorn==0.27.1
uvloop==0.19.0
vine==5.1.0
watchfiles==0.21.0
wcwidth==0.2.13
websockets==12.0
wrapt==1.16.0
xmltodict==0.13.0
yarl==1.9.4
zipp==3.18.1
How can we reproduce your problem?
Run big app with ddtrace-run
What is the result that you get?
High CPU usage on application start
What is the result that you expected?
CPU usage spike on start is not happening or at least decreased as much as possible
Hello @ruslansuprun thank you for reporting this issue.
Can you tell us what products or environment variables you have enabled? For example DD_IAST_ENABLED or DD_PROFILING_ENABLED
Sure
Here is set of DD_* env variables that we are using:
DD_AGENT_HOST: xxx
DD_SERVICE: xxx
DD_VERSION: xxx
DD_ENV: xxx
DD_LOGS_INJECTION: true
DD_PROFILING_ENABLED: true
DD_SERVICE_MAPPING: xxx
DD_TRACE_ENABLED: true
DD_TRACE_SAMPLE_RATE: 0.1
Thank you,
This CPU spike, does it also happen if you disable DD_LOGS_INJECTION or DD_PROFILING_ENABLED?
The spike does not happen in 2.8.0 with the same env variables?
It's happening when DD_PROFILING_ENABLED is true, when it's false CPU usage spikes not happening
DD_LOGS_INJECTION value does not affect anything related to CPU usage
We've upgraded from 2.7.4 and issue was present there too
So we've decided to upgrade to latest version to see if issue is still there
@ruslansuprun Definitely looks like a profiling issue. Since you're on a recent version of 2.7/2.8, could you try enabling the DD_PROFILING_EXPORT_LIBDD_ENABLED=true environment variable? Other customers with similar issues on 3.12 have seen improvement with this. Thank you!
@sanchda so this new env variable should be added along with DD_PROFILING_ENABLED=true or replacing this env variable with DD_PROFILING_EXPORT_LIBDD_ENABLED=true?
@ruslansuprun It's in addition to. It modifies the behavior of the profiler, but the profiler won't be enabled without specifying it explicitly.
@sanchda tried this, small improvements are in place, but still CPU usage on start is too high 😢 Disabled profiling for now
Did some head-scratching over this. @ruslansuprun, if you're up for another experiment, I see that in some cases the profiler startup time can be negatively impacted by code provenance ("My Code" in the UI). If you set DD_PROFILING_ENABLE_CODE_PROVENANCE=false, this might help startup times.
If it doesn't help, then I'd be very interested in trying to characterize how the profiler is wasting time. You may not be the only one with this issue.
@sanchda sorry for a long response, will test it in few days and will return back with feedback
@sanchda hi
I've tried enabling profiling with DD_PROFILING_ENABLE_CODE_PROVENANCE=false option but still no luck with decrease of CPU usage on start
We are running 2 services
First one using gunicorn to run with 12 workers per container, workers amount is set by env variable
Options for gunicorn: gunicorn <app> -b 0.0.0.0:8000 -t 300 --max-requests 100000 --max-requests-jitter 20000 --keep-alive 60 --graceful-timeout 60 --preload
Second on is using uvicorn with 10 workers per container, workers amount is set by env variable
Options for uvicorn: uvicorn <app> --host 0.0.0.0 --port 5000 --no-access-log --proxy-headers --forwarded-allow-ips * --lifespan off --timeout-keep-alive 60 --timeout-graceful-shutdown 60
@ruslansuprun , sorry for letting this get stale. I'm working on some features which should help reduce overhead, but it might be a while before they're stable enough for you to use. I don't really have any guidance to offer. I'll remove the stale label for now. and leave this open since it is still a problem.
This issue has been automatically closed after a period of inactivity. If it's a feature request, it has been added to the maintainers' internal backlog and will be included in an upcoming round of feature prioritization. Please comment or reopen if you think this issue was closed in error.