dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

High CPU usage on startup

Open ruslansuprun opened this issue 1 year ago • 13 comments

Summary of problem

During start of our applications CPU usage is going 3x from usual utilization, after few mins it returns back to normal state, but it does affect our autoscaling. When we are starting app without ddtrace-run command CPU usage spikes on start does not appear.

Python version: 3.12

Which version of dd-trace-py are you using?

2.8.1

Which version of pip are you using?

24.0

Which libraries and their versions are you using?

`pip freeze`
aiohttp==3.9.3
aiosignal==1.3.1
amqp==5.2.0
annotated-types==0.6.0
anyio==4.3.0
asgiref==3.7.2
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
billiard==4.2.0
boto3==1.34.71
botocore==1.34.71
bytecode==0.15.1
cachetools==5.3.3
cattrs==23.2.3
celery==5.3.6
certifi==2024.2.2
cffi==1.16.0
channels==4.0.0
channels-demultiplexer==2.2.0
channels-redis==4.1.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
clickhouse-driver==0.2.7
cron-descriptor==1.4.3
cryptography==42.0.5
curlify==2.2.1
ddsketch==3.0.1
ddtrace==2.8.1
defusedxml==0.7.1
Deprecated==1.2.14
dicttoxml==1.7.16
Django==5.0.3
django-admin-autocomplete-filter==0.7.1
django-annoying==0.10.6
django-appconf==1.0.6
django-celery-beat==2.6.0
django-celery-results==2.3.1
django-cors-headers==4.3.1
django-debug-toolbar==4.3.0
django-deprecate-fields==0.1.2
django-dirtyfields==1.9.2
django-elasticsearch-dsl==7.4
django-elasticsearch-dsl-drf==0.22.5
django-environ==0.11.2
django-extensions==3.2.3
django-filter==24.1
django-fsm==2.8.1
django-health-check==3.18.1
django-model-utils==4.4.0
django-nine==0.2.7
django-redis==5.4.0
django-redis-sessions==0.6.2
django-redis-sessions-fork==0.7.0
django-split-settings==1.3.0
django-storages==1.14.2
django-timezone-field==6.1.0
djangorestframework==3.14.0
djangorestframework-simplejwt==5.3.1
drf-extensions==0.7.1
drf-spectacular==0.27.1
elasticsearch==7.13.4
elasticsearch-dsl==7.4.1
envier==0.5.1
facebook_business==19.0.2
factory-boy==3.3.0
Faker==23.3.0
fakeredis==2.21.3
frozenlist==1.4.1
ga4mp==2.0.4
google-auth==2.29.0
google-measurement-protocol==1.1.0
growthbook==1.0.0
gunicorn==21.2.0
h11==0.14.0
hiredis==2.3.2
httptools==0.6.1
hurry.filesize==0.9
idna==3.6
importlib-metadata==6.11.0
inflection==0.5.1
jmespath==0.10.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kombu==5.3.5
lupa==2.0
maxminddb==2.6.0
maxminddb-geolite2==2018.703
mongoengine==0.27.0
mongomock==4.1.2
msgpack==1.0.8
msgspec==0.18.6
multidict==6.0.5
oauthlib==3.2.2
opentelemetry-api==1.23.0
packaging==24.0
paypalrestsdk==1.13.3
phonenumbers==8.13.33
pillow==10.2.0
prices==1.1.1
prompt-toolkit==3.0.43
protobuf==5.26.0
psutil==5.9.8
psycopg==3.1.18
psycopg-binary==3.1.18
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycountry==23.12.11
pycparser==2.21
pycryptodome==3.20.0
pydantic==2.6.4
pydantic_core==2.16.3
PyJWT==2.8.0
pymongo==3.13.0
pyOpenSSL==24.1.0
pyotp==2.9.0
python-crontab==3.0.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-http-client==3.3.7
python-json-logger==2.0.7
python3-openid==3.2.0
pytz==2024.1
PyYAML==6.0.1
redis==5.0.3
referencing==0.34.0
reportlab==4.1.0
requests==2.31.0
requests-oauthlib==1.4.0
rpds-py==0.18.0
rsa==4.9
s3transfer==0.10.1
sendgrid==6.11.0
sentinels==1.0.0
sentry-sdk==1.43.0
setuptools==69.2.0
six==1.16.0
sniffio==1.3.1
social-auth-app-django==5.4.0
social-auth-core==4.5.3
solidgate-card-sdk==0.1.0
sortedcontainers==2.4.0
soupsieve==2.5
sqlparse==0.4.4
starkbank-ecdsa==2.2.0
stripe==8.8.0
twilio==7.17.0
typing_extensions==4.10.0
tzdata==2024.1
tzlocal==5.2
ua-parser==0.18.0
ujson==5.9.0
uritemplate==4.1.1
urllib3==1.26.18
user-agents==2.2.0
uvicorn==0.27.1
uvloop==0.19.0
vine==5.1.0
watchfiles==0.21.0
wcwidth==0.2.13
websockets==12.0
wrapt==1.16.0
xmltodict==0.13.0
yarl==1.9.4
zipp==3.18.1

How can we reproduce your problem?

Run big app with ddtrace-run

What is the result that you get?

High CPU usage on application start

What is the result that you expected?

CPU usage spike on start is not happening or at least decreased as much as possible

ruslansuprun avatar Apr 16 '24 09:04 ruslansuprun

Hello @ruslansuprun thank you for reporting this issue.

Can you tell us what products or environment variables you have enabled? For example DD_IAST_ENABLED or DD_PROFILING_ENABLED

gnufede avatar Apr 16 '24 10:04 gnufede

Sure Here is set of DD_* env variables that we are using:

DD_AGENT_HOST: xxx
DD_SERVICE: xxx
DD_VERSION: xxx
DD_ENV: xxx
DD_LOGS_INJECTION: true
DD_PROFILING_ENABLED: true
DD_SERVICE_MAPPING: xxx
DD_TRACE_ENABLED: true
DD_TRACE_SAMPLE_RATE: 0.1

ruslansuprun avatar Apr 16 '24 10:04 ruslansuprun

Thank you,

This CPU spike, does it also happen if you disable DD_LOGS_INJECTION or DD_PROFILING_ENABLED?

The spike does not happen in 2.8.0 with the same env variables?

gnufede avatar Apr 16 '24 10:04 gnufede

It's happening when DD_PROFILING_ENABLED is true, when it's false CPU usage spikes not happening

DD_LOGS_INJECTION value does not affect anything related to CPU usage

ruslansuprun avatar Apr 16 '24 12:04 ruslansuprun

We've upgraded from 2.7.4 and issue was present there too So we've decided to upgrade to latest version to see if issue is still there

ruslansuprun avatar Apr 16 '24 12:04 ruslansuprun

@ruslansuprun Definitely looks like a profiling issue. Since you're on a recent version of 2.7/2.8, could you try enabling the DD_PROFILING_EXPORT_LIBDD_ENABLED=true environment variable? Other customers with similar issues on 3.12 have seen improvement with this. Thank you!

sanchda avatar Apr 16 '24 15:04 sanchda

@sanchda so this new env variable should be added along with DD_PROFILING_ENABLED=true or replacing this env variable with DD_PROFILING_EXPORT_LIBDD_ENABLED=true?

ruslansuprun avatar Apr 16 '24 15:04 ruslansuprun

@ruslansuprun It's in addition to. It modifies the behavior of the profiler, but the profiler won't be enabled without specifying it explicitly.

sanchda avatar Apr 16 '24 15:04 sanchda

@sanchda tried this, small improvements are in place, but still CPU usage on start is too high 😢 Disabled profiling for now

ruslansuprun avatar Apr 16 '24 15:04 ruslansuprun

Did some head-scratching over this. @ruslansuprun, if you're up for another experiment, I see that in some cases the profiler startup time can be negatively impacted by code provenance ("My Code" in the UI). If you set DD_PROFILING_ENABLE_CODE_PROVENANCE=false, this might help startup times.

If it doesn't help, then I'd be very interested in trying to characterize how the profiler is wasting time. You may not be the only one with this issue.

sanchda avatar Apr 22 '24 16:04 sanchda

@sanchda sorry for a long response, will test it in few days and will return back with feedback

ruslansuprun avatar Apr 30 '24 08:04 ruslansuprun

@sanchda hi I've tried enabling profiling with DD_PROFILING_ENABLE_CODE_PROVENANCE=false option but still no luck with decrease of CPU usage on start

ruslansuprun avatar May 02 '24 08:05 ruslansuprun

We are running 2 services First one using gunicorn to run with 12 workers per container, workers amount is set by env variable Options for gunicorn: gunicorn <app> -b 0.0.0.0:8000 -t 300 --max-requests 100000 --max-requests-jitter 20000 --keep-alive 60 --graceful-timeout 60 --preload

Second on is using uvicorn with 10 workers per container, workers amount is set by env variable Options for uvicorn: uvicorn <app> --host 0.0.0.0 --port 5000 --no-access-log --proxy-headers --forwarded-allow-ips * --lifespan off --timeout-keep-alive 60 --timeout-graceful-shutdown 60

ruslansuprun avatar May 02 '24 08:05 ruslansuprun

@ruslansuprun , sorry for letting this get stale. I'm working on some features which should help reduce overhead, but it might be a while before they're stable enough for you to use. I don't really have any guidance to offer. I'll remove the stale label for now. and leave this open since it is still a problem.

sanchda avatar Jul 04 '24 20:07 sanchda

This issue has been automatically closed after a period of inactivity. If it's a feature request, it has been added to the maintainers' internal backlog and will be included in an upcoming round of feature prioritization. Please comment or reopen if you think this issue was closed in error.

github-actions[bot] avatar Dec 02 '24 00:12 github-actions[bot]