llm-foundry How to use the train.py finetuning the pre-trained MPT-7B?

How to use the train.py finetuning the pre-trained MPT-7B?

Open metacarbon opened this issue 2 years ago • 1 comments

It seems like we need to perform the pre-trainning process to get the checkpoint file before we can fine-tuning the MPT model.

1b_local_data_sft.yaml mentioned that we have to replace the load_path with our own checkpoint path.

Can I use the one (.bin) in Hugging Face as the pre-trained model loading file?

May 10 '23 04:05 metacarbon

Hi @metacarbon, I have a PR here which hopefully addresses the issue you are running into. Basically, load_path is for Composer checkpoints; there is a different syntax for models loaded from the HF Hub in HF format. tl;dr use this yaml|

However,

  config_overrides:
    attn_impl: triton

won't work until #90 is merged in. In the meantime you should be able to just leave it out and it will use torch attention. The sequences are short enough that you shouldn't OOM, but you may have to change the microbatch size if you run into memory issues (without Triton you don't have flash attention so memory usage is much worse).

May 11 '23 00:05 samhavens

@metacarbon The PR mentioned above has gone through and should make it easier to finetune from the pretrained MPT-7b model on Hugging Face.

Do you have more questions in this area? If not, I'll close the issue.

May 15 '23 23:05 alextrott16

@alextrott16 Recently, I don't have time to test it. Thanks for the PR I know how to fine-tuning MPT with HF. Thank you all of you for the support!

May 16 '23 01:05 metacarbon

Happy to be of service :)

May 16 '23 02:05 alextrott16

llm-foundry llm-foundry copied to clipboard

How to use the train.py finetuning the pre-trained MPT-7B?

llm-foundry
llm-foundry copied to clipboard