llm-foundry
llm-foundry copied to clipboard
How to use the train.py finetuning the pre-trained MPT-7B?
It seems like we need to perform the pre-trainning process to get the checkpoint file before we can fine-tuning the MPT model.
1b_local_data_sft.yaml mentioned that we have to replace the load_path with our own checkpoint path.
Can I use the one (.bin) in Hugging Face as the pre-trained model loading file?
Hi @metacarbon, I have a PR here which hopefully addresses the issue you are running into. Basically, load_path is for Composer checkpoints; there is a different syntax for models loaded from the HF Hub in HF format. tl;dr use this yaml|
However,
config_overrides:
attn_impl: triton
won't work until #90 is merged in. In the meantime you should be able to just leave it out and it will use torch attention. The sequences are short enough that you shouldn't OOM, but you may have to change the microbatch size if you run into memory issues (without Triton you don't have flash attention so memory usage is much worse).
@metacarbon The PR mentioned above has gone through and should make it easier to finetune from the pretrained MPT-7b model on Hugging Face.
Do you have more questions in this area? If not, I'll close the issue.
@alextrott16 Recently, I don't have time to test it. Thanks for the PR I know how to fine-tuning MPT with HF. Thank you all of you for the support!
Happy to be of service :)