llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Please document how to fine tune mpt-7b with longer contexts

Open jwatte opened this issue 2 years ago • 3 comments

🚀 Feature Request

I believe that, to infer using a longer context, I can set max_seq_len to something longer when starting the huggingface based inference driver. However, I don't understand how to increase the max_seq_len when fine tuning. Just upping it in the yaml file for fine tuning doesn't work because it complains the model is limited to 2048 tokens.

Motivation

ALiBi is documented as supporting longer contexts, and the blog posts talk about it, and point at the various fine tuning yaml examples, but none of them show how to train with a longer context window.

Adding a little bit of documentation on how to actually use the ALiBi features would go a long way!

jwatte avatar Jun 23 '23 21:06 jwatte

Plus one to that. I am also wondering how to finetune on a longer context.

PNAKTEMPORAL avatar Jun 27 '23 08:06 PNAKTEMPORAL

Actually, starting the model with max_seq_len=8192 does start, but doesn't actually work. So, no idea how to use ALiBi here?

jwatte avatar Jun 27 '23 16:06 jwatte

Hey @jwatte , please check this out https://github.com/mosaicml/llm-foundry/issues/380 It solved the issue for me.

As for the AliBi, as per @alextrott16 claim, it is on by default.

PNAKTEMPORAL avatar Jun 27 '23 17:06 PNAKTEMPORAL

Interesting, so, model override might do it for training.

For inference, it doesn't work by default:

python llm-foundry/scripts/inference/hf_generate.py -n mosaicml/mpt-7b-instruct --model_dtype=bf16 --autocast_dtype=bf16 --device=cuda --prompts 'file::question.txt'

AssertionError: Cannot forward input with seq_len=3561, this model only supports seq_len<=2048

For inference, when I run a regular model, and try to configure to use more tokens, the inferred text is nonsense.

python llm-foundry/scripts/inference/hf_generate.py -n mosaicml/mpt-7b-instruct --model_dtype=bf16 --autocast_dtype=bf16 --device=cuda --max_seq_len=8192 --prompts 'file::question.txt'

2023-06-28 17:17:20.752574 ####################################################################################################
When we said that “A few words?”.
####################################################################################################

A simple invocation example that's expected to work would be very helpful.

jwatte avatar Jun 28 '23 15:06 jwatte

As the linked issue above explains, for training an MPT model, you want to make sure the config overrides inherit max_seq_len from the same setting defined at the top of the YAML. So, you add:

model:
  config_overrides:
    max_seq_len: ${max_seq_len}

When you use the --max_seq_len flag in hf_generate.py with an MPT model, it effectively does the same thing (it overrides the default max length with the supplied value). Note that if you train your own MPT model with a sequence length of, say, 8k, and then save that to a HF checkpoint, when you load that checkpoint with hf_generate.py, it will already default to that 8k sequence length.

In any case, @jwatte the second command that you ran is the correct way to load in mpt-7b-instruct with a longer sequence length in hf_generate.py. The output text could be bad due to the model failing to do good inference, the generation parameters being suboptimal, or possibly something with the prompt you fed it. Note that the instruct model was trained with a particular formatting on the instructions. You can inspect this file to see the correct way to format the prompt for this model.

alextrott16 avatar Jun 30 '23 22:06 alextrott16

At this point, I believe this issue is duplicated by https://github.com/mosaicml/llm-foundry/issues/380, so I'm going to close this one.

alextrott16 avatar Jun 30 '23 23:06 alextrott16