mPLUG-DocOwl
mPLUG-DocOwl copied to clipboard
Deadlock and CUDA OOM Issues with finetune_docowl.sh Using DeepSpeed Zero 2 and Zero 3
Hello. Thank you very much for sharing such great results. I really want to fine-tune and use this model.
As far as I have understood so far, when running finetune_docowl.sh
, there is a deadlock issue with DeepSpeed stage 3 and 3-offload (zero 3, zero-offload), and it seems to be the same with zero2 and 3 of finetune_docowl_lora.sh
.
Currently, I haven't been able to use finetune_docowl.sh (w/ zero2)
due to a CUDA OOM issue.
Am I understanding this correctly? If you have resolved any of these deadlock issues, please share.