e5-mistral-7b-instruct
e5-mistral-7b-instruct copied to clipboard
Exception when "--checkpointing_steps" is set
Source code in Accelerate lib shows that weights in hooks is empty if the training task is launched via Deepspeed.
https://github.com/huggingface/accelerate/blob/b8c85839531ded28efb77c32e0ad85af2062b27a/src/accelerate/accelerator.py#L2778-L2824
Threrfore, IndexError will be raised in save_model_hook.
https://github.com/kamalkraj/e5-mistral-7b-instruct/blob/99021919b3c82bc67a4a897e8e9f39efe3d72cdc/peft_lora_embedding_semantic_search.py#L158-L162
Another error is that if "--checkpointing_steps" is set as "epoch", acceleator.save_state() times out but it works if an integer is set.
Hi, Have you solved this problem?
@liujiqiang999 Do not register the hooks. # accelerator.register_save_state_pre_hook(save_model_hook) # accelerator.register_load_state_pre_hook(load_model_hook)