dd-trace-py
dd-trace-py copied to clipboard
Intermittent `RuntimeError: the memalloc module was not started` error
Which version of dd-trace-py are you using?
ddtrace==0.57.0
What is the result that you get?
RuntimeError: the memalloc module was not started
What is the result that you expected?
No errors.
This seems to be happening a few times a day.
We have tried setting DD_PROFILING_HEAP_ENABLED=False
and DD_PROFILING_MEMALLOC=0
in the environment, but the errors continue to appear.
Configuration in Django:
import os
from ddtrace import config, tracer
# DataDog Setup
tracer.configure(hostname=os.environ.get("HOST_IP"))
tracer.configure(enabled=True)
tracer.set_tags(
{"env": os.environ.get("ENVIRONMENT"), "namespace": os.environ.get("NAMESPACE")}
)
config.django["analytics_enabled"] = True
config.django["cache_service_name"] = "xxx-cache"
config.django["database_service_name_prefix"] = "xxx"
config.django["distributed_tracing_enabled"] = True
config.django["instrument_middleware"] = True
config.django["service_name"] = "xxx"
We are facing the same issue (same ddtrace
version) with a Fastapi app. I'd be happy to share logs/config you need.
Could you share:
- which Web server you're running?
- Its configuration?
- If you're using gevent?
I am using nginx feeding into gunicon/gevent with regular proxy setup. Not sure which configurations would be helpful to you?
We have 2 different servers (not behind nginx).
flask
- Gunicorn
-
--worker-class=gthread
- No
fastapi
- Gunicorn
-
--worker-class=uvicorn.workers.UvicornWorker
- No
+1 We have the same issue with Gunicorn + Django
+1 same in 0.59.1
we are using sanic with profiling enabled: DD_PROFILING_ENABLED=true
datadog~=0.44.0
ddtrace~=0.59.1
requests~=2.27.1
sanic~=19.12.2

Currently, having the same issue
Same issue, with Flask/gunicorn/gevent, using ddtrace 1.2.1:
RuntimeError: the memalloc module was not started
File "ddtrace/internal/periodic.py", line 70, in run
self._target()
File "ddtrace/profiling/collector/__init__.py", line 42, in periodic
for events in self.collect():
File "ddtrace/profiling/collector/memalloc.py", line 145, in collect
events, count, alloc_count = _memalloc.iter_events()
Same issue with Flask/gunicorn/gevent as well using ddtrace 1.4.4:
This might be due to gevent monkey patching not done properly.
Is everyone using DD_GEVENT_PATCH_ALL=1
?
@jd we're working on adding DD_GEVENT_PATCH_ALL
in as recommended by Datadog support. we also need to upgrade some other some deps along the way to make it work. i'll post back if that gets us resolved.
@rgilkey did adding DD_GEVENT_PATCH_ALL
resolve the issue? I'm facing the same problem with ddtrace v1.7.5
We are currently facing the same issue, and it seems to be a bit random, it appears really punctually... We use Flask + Gunicorn and ddtrace==1.0.1
'Me too'ing this for Gunicorn, Uvicorn, FastaAPI with dd-trace 1.10.2
@swingingsimian can I please check with you whether you are seeing
RuntimeError: the memalloc module was not started
or
RuntimeError: the memalloc module is already started
or both?
I'm facing this issue, weirdly - only seeing it in our staging cluster and not in production. Only difference is more replicas and higher cpu limits in our production pods.
FastAPI, Gunicorn
gunicorn = "^20.1.0"
uvicorn = "^0.17.0"
[2023-04-18 21:41:23 +0000] [29] [INFO] Booting worker with pid: 29
Failed to start collector MemoryCollector(status=<ServiceStatus.STOPPED: 'stopped'>, recorder=Recorder(default_max_events=16384, max_events={<class 'ddtrace.profiling.collector.stack_event.StackSampleEvent'>: 30000, <class 'ddtrace.profiling.collector.stack_event.StackExceptionSampleEvent'>: 15000, <class 'ddtrace.profiling.collector.memalloc.MemoryAllocSampleEvent'>: 1920, <class 'ddtrace.profiling.collector.memalloc.MemoryHeapSampleEvent'>: None}), _max_events=16, max_nframe=64, heap_sample_size=1048576, ignore_profiler=False), disabling.
Traceback (most recent call last):
File "/opt/pysetup/.venv/lib/python3.11/site-packages/ddtrace/profiling/profiler.py", line 266, in _start_service
col.start()
File "/opt/pysetup/.venv/lib/python3.11/site-packages/ddtrace/internal/service.py", line 58, in start
self._start_service(*args, **kwargs)
File "/opt/pysetup/.venv/lib/python3.11/site-packages/ddtrace/profiling/collector/memalloc.py", line 108, in _start_service
_memalloc.start(self.max_nframe, self._max_events, self.heap_sample_size)
RuntimeError: the memalloc module is already started
We are also seeing this behavior as of yesterday in our development cluster:
Traceback (most recent call last):
File "/app/python/<app>/wsgi_image.binary.runfiles/pypi_ddtrace/site-packages/ddtrace/profiling/profiler.py", line 266, in _start_service
col.start()
File "/app/python/<app>/wsgi_image.binary.runfiles/pypi_ddtrace/site-packages/ddtrace/internal/service.py", line 58, in start
self._start_service(*args, **kwargs)
File "/app/python/<app>/wsgi_image.binary.runfiles/pypi_ddtrace/site-packages/ddtrace/profiling/collector/memalloc.py", line 108, in _start_service
_memalloc.start(self.max_nframe, self._max_events, self.heap_sample_size)
RuntimeError: the memalloc module is already started
[2023-04-18 22:45:49 +0000] [72] [INFO] Booting worker with pid: 72
I believe it was triggered after our nodegroups cycled out in dev, the containers don't want to come back online. In this case many of them are stuck in crashloopbackoff for a period of time. It does eventually stabilize... maybe this is an API issue on datadog's end? We are running ddtrace 1.12.0
Thanks for the reports @lfvarela @joesteffee. This exception is handled and so the only issue caused is that memory allocations won't be profiled. I think I have found the source of the problem, but can I please confirm with you that your services work and that this is just undesirable noise in your logs?
The new reports are a different issue that is a manifestation of a regression. The fix is in #5586.
Thanks! To confirm - our servers couldn't even start. Weirdly - by increasing our CPU limits in our servers we stopped seeing the issue.