ART
ART copied to clipboard
Worker proc VllmWorker-1 died unexpectedly
Already spoken a little with @bradhilton about this one! Popping here just in case others experience it too and for keeping track.
Happens only for multi-gpu. Using Qwen/Qwen3-0.6B on 2x A100s (80GB)
On the first step after rewards are calculated, I get this:
DEBUG 07-04 17:35:10 [shm_broadcast.py:456] No available shared memory broadcast block found in 60 second.
ERROR 07-04 17:35:16 [multiproc_executor.py:140] Worker proc VllmWorker-1 died unexpectedly, shutting down executor.
DEBUG 07-04 17:36:10 [shm_broadcast.py:456] No available shared memory broadcast block found in 60 second.
DEBUG 07-04 17:37:10 [shm_broadcast.py:456] No available shared memory broadcast block found in 60 second.
DEBUG 07-04 17:38:10 [shm_broadcast.py:456] No available shared memory broadcast block found in 60 second.