ART icon indicating copy to clipboard operation
ART copied to clipboard

Fix deadlock in results_queue.join() during training

Open benediktstroebl opened this issue 5 months ago • 0 comments

Add a 10-second timeout to results_queue.join() to prevent indefinite hangs when lingering results aren't properly consumed. If a timeout occurs, drain any remaining items from the queue to allow training to continue.

This fixes an issue where training could deadlock between steps if results from a previous step remained unprocessed in the queue.

benediktstroebl avatar Oct 07 '25 09:10 benediktstroebl