gpu-bdb icon indicating copy to clipboard operation
gpu-bdb copied to clipboard

Q23 intermittently freezing in nightly runs

Open beckernick opened this issue 5 years ago • 2 comments

In the automated run this morning, Q23 TCP completed once and then froze and was left running for hours with no progress.

Worker log

distributed.worker - ERROR - Set changed size during iteration
Traceback (most recent call last):
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
    self.put_key_in_memory(ts, value)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
    for dep in ts.dependents:
RuntimeError: Set changed size during iteration
distributed.utils - ERROR - Set changed size during iteration
Traceback (most recent call last):
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
    yield
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 2119, in gather_dep
    self.transition(ts, "memory", value=data[d])
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1539, in transition
    state = func(ts, **kwargs)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
    self.put_key_in_memory(ts, value)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
    for dep in ts.dependents:
RuntimeError: Set changed size during iteration
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f5676beff50>>, <Task finished coro=<Worker.gather_dep() done, defined at /raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py:2000> exception=RuntimeError('Set changed size during iteration')>)
Traceback (most recent call last):
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 2119, in gather_dep
    self.transition(ts, "memory", value=data[d])
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1539, in transition
    state = func(ts, **kwargs)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
    self.put_key_in_memory(ts, value)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
    for dep in ts.dependents:
RuntimeError: Set changed size during iteration

beckernick avatar Dec 11 '20 14:12 beckernick

Resolved

beckernick avatar Jan 22 '21 20:01 beckernick

This issue is popping back up again.

beckernick avatar Feb 23 '21 17:02 beckernick