gpu-bdb
gpu-bdb copied to clipboard
Q23 intermittently freezing in nightly runs
In the automated run this morning, Q23 TCP completed once and then froze and was left running for hours with no progress.
Worker log
distributed.worker - ERROR - Set changed size during iteration
Traceback (most recent call last):
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
self.put_key_in_memory(ts, value)
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
for dep in ts.dependents:
RuntimeError: Set changed size during iteration
distributed.utils - ERROR - Set changed size during iteration
Traceback (most recent call last):
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
yield
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 2119, in gather_dep
self.transition(ts, "memory", value=data[d])
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1539, in transition
state = func(ts, **kwargs)
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
self.put_key_in_memory(ts, value)
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
for dep in ts.dependents:
RuntimeError: Set changed size during iteration
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f5676beff50>>, <Task finished coro=<Worker.gather_dep() done, defined at /raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py:2000> exception=RuntimeError('Set changed size during iteration')>)
Traceback (most recent call last):
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/tornado/ioloop.py", line 741, in _run_callback
ret = callback()
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
future.result()
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 2119, in gather_dep
self.transition(ts, "memory", value=data[d])
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1539, in transition
state = func(ts, **kwargs)
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
self.put_key_in_memory(ts, value)
File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
for dep in ts.dependents:
RuntimeError: Set changed size during iteration
Resolved
This issue is popping back up again.