newrelic-python-agent icon indicating copy to clipboard operation
newrelic-python-agent copied to clipboard

RuntimeError: dictionary changed size during iteration

Open mquillin opened this issue 3 years ago • 1 comments

Description I am not able to reopen and issue so reposting this here, please reference #653

list(dict.values()) is still iterating over the dictionary to copy it to a list - you're likely going to need a lock or some other concurrency mechanism.

We experienced in incident the other day where our servers were overload (100% cpu usage) and the newrelic agent was crashing and making things much worse. A dictionary is being modified while its being iterated over - looking at the code there does not seem to be any protection from this happening

Expected Behavior The agent does not crash and properly handler concurrency

Troubleshooting

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 404, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/newrelic/api/asgi_application.py", line 359, in nr_async_asgi
    return await coro
  File "/usr/local/lib/python3.9/site-packages/newrelic/common/async_proxy.py", line 148, in __next__
    return self.send(None)
  File "/usr/local/lib/python3.9/site-packages/newrelic/common/async_proxy.py", line 120, in send
    return self.__wrapped__.send(value)
  File "/usr/local/lib/python3.9/site-packages/newrelic/common/async_proxy.py", line 110, in __exit__
    trace_cache().record_event_loop_wait(self.enter_time, time.time())
  File "/usr/local/lib/python3.9/site-packages/newrelic/core/trace_cache.py", line 362, in record_event_loop_wait
    for trace in self._cache.values():
  File "/usr/local/lib/python3.9/weakref.py", line 248, in values
    for wr in self.data.values():
RuntimeError: dictionary changed size during iteration

Steps to Reproduce This is not very easy to reproduce, but if needed I can try to come up with an application. Reading the code I did not really see how this scenario was protected against and the fix could be as simple as iterating over a copy of the dict.

Your Environment

fastapi==0.75.0
uvicorn==0.18.3
gunicorn==20.1.0
newrelic==8.1.0.180

Python FastAPI application running in a docker container in kubernetes.

mquillin avatar Oct 14 '22 16:10 mquillin

We need to do a dict.copy() instead of the previous fix.

hmstepanek avatar Oct 17 '22 21:10 hmstepanek