e5-mistral-7b-instruct icon indicating copy to clipboard operation
e5-mistral-7b-instruct copied to clipboard

Exception when "--checkpointing_steps" is set

Open Hypothesis-Z opened this issue 1 year ago • 2 comments

Source code in Accelerate lib shows that weights in hooks is empty if the training task is launched via Deepspeed.

https://github.com/huggingface/accelerate/blob/b8c85839531ded28efb77c32e0ad85af2062b27a/src/accelerate/accelerator.py#L2778-L2824

Threrfore, IndexError will be raised in save_model_hook.

https://github.com/kamalkraj/e5-mistral-7b-instruct/blob/99021919b3c82bc67a4a897e8e9f39efe3d72cdc/peft_lora_embedding_semantic_search.py#L158-L162


Another error is that if "--checkpointing_steps" is set as "epoch", acceleator.save_state() times out but it works if an integer is set.

Hypothesis-Z avatar Apr 10 '24 09:04 Hypothesis-Z

Hi, Have you solved this problem?

liujiqiang999 avatar Jun 05 '24 03:06 liujiqiang999

@liujiqiang999 Do not register the hooks. # accelerator.register_save_state_pre_hook(save_model_hook) # accelerator.register_load_state_pre_hook(load_model_hook)

Hypothesis-Z avatar Jun 07 '24 09:06 Hypothesis-Z