newrelic-python-agent
newrelic-python-agent copied to clipboard
RuntimeError: dictionary changed size during iteration
Description I am not able to reopen and issue so reposting this here, please reference #653
list(dict.values()) is still iterating over the dictionary to copy it to a list - you're likely going to need a lock or some other concurrency mechanism.
We experienced in incident the other day where our servers were overload (100% cpu usage) and the newrelic agent was crashing and making things much worse. A dictionary is being modified while its being iterated over - looking at the code there does not seem to be any protection from this happening
Expected Behavior The agent does not crash and properly handler concurrency
Troubleshooting
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 404, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/newrelic/api/asgi_application.py", line 359, in nr_async_asgi
return await coro
File "/usr/local/lib/python3.9/site-packages/newrelic/common/async_proxy.py", line 148, in __next__
return self.send(None)
File "/usr/local/lib/python3.9/site-packages/newrelic/common/async_proxy.py", line 120, in send
return self.__wrapped__.send(value)
File "/usr/local/lib/python3.9/site-packages/newrelic/common/async_proxy.py", line 110, in __exit__
trace_cache().record_event_loop_wait(self.enter_time, time.time())
File "/usr/local/lib/python3.9/site-packages/newrelic/core/trace_cache.py", line 362, in record_event_loop_wait
for trace in self._cache.values():
File "/usr/local/lib/python3.9/weakref.py", line 248, in values
for wr in self.data.values():
RuntimeError: dictionary changed size during iteration
Steps to Reproduce This is not very easy to reproduce, but if needed I can try to come up with an application. Reading the code I did not really see how this scenario was protected against and the fix could be as simple as iterating over a copy of the dict.
Your Environment
fastapi==0.75.0
uvicorn==0.18.3
gunicorn==20.1.0
newrelic==8.1.0.180
Python FastAPI application running in a docker container in kubernetes.
We need to do a dict.copy() instead of the previous fix.