Davide Caffagni
Davide Caffagni
Same problem on Ubuntu 18.04 and @linuxsen 's fix worked for me! ( "System.h", not "system.h")
[BUG] ZeRO 3 error: expected the next 4 parameters in the parameter fetch queue to be ... but got ()
It may be worth noting that the error happens right after the first detected **OVERFLOW** in the run. However, multiple overflows occurred during the previous 24h of training (before resuming...
[BUG] ZeRO 3 error: expected the next 4 parameters in the parameter fetch queue to be ... but got ()
I'm able to reproduce the error if resuming from a checkpoint using the Huggingface's Trainer API (`resume_from_checkpoint`) and simulate an overflow with ``` # self.test_of is True for the first...
[BUG] ZeRO 3 error: expected the next 4 parameters in the parameter fetch queue to be ... but got ()
I'm on a different project now, and I'm experiencing the same error even while training from scratch rather than resuming from a checkpoint. Again, the error pops out after an...
Dear @lichenxi-cat and @hybra, we've just released LLaVA-MORE 8B on [Ollama](https://ollama.com/aimagelab/llava-more-8b). Best, Davide