ART icon indicating copy to clipboard operation
ART copied to clipboard

Worker proc VllmWorker-1 died unexpectedly

Open Danau5tin opened this issue 8 months ago • 1 comments

Already spoken a little with @bradhilton about this one! Popping here just in case others experience it too and for keeping track.

Happens only for multi-gpu. Using Qwen/Qwen3-0.6B on 2x A100s (80GB)

On the first step after rewards are calculated, I get this:

DEBUG 07-04 17:35:10 [shm_broadcast.py:456] No available shared memory broadcast block found in 60 second.
ERROR 07-04 17:35:16 [multiproc_executor.py:140] Worker proc VllmWorker-1 died unexpectedly, shutting down executor.
DEBUG 07-04 17:36:10 [shm_broadcast.py:456] No available shared memory broadcast block found in 60 second.
DEBUG 07-04 17:37:10 [shm_broadcast.py:456] No available shared memory broadcast block found in 60 second.
DEBUG 07-04 17:38:10 [shm_broadcast.py:456] No available shared memory broadcast block found in 60 second.

Danau5tin avatar Jul 04 '25 20:07 Danau5tin