llm2vec How to run larger models like Llama-2-13B or 70B

How to run larger models like Llama-2-13B or 70B

Open SGidentification opened this issue 9 months ago • 1 comments

Thank you for sharing your codes, I was wondering how we can apply LLM2Vec for larger LLMs such as 13B or 70B?

May 07 '24 09:05 SGidentification

Yes it is definitely possible, all that will be required is to change the model name and the batch size in the config.

For example, for Llama 13B, you should change model_name_or_path from meta-llama/Llama-2-7b-chat-hf to meta-llama/Llama-2-13b-chat-hf. The per_device_train_batch_size and per_device_eval_batch_size should also be adjusted accordingly.

After that the same training code should work

python experiments/run_mntp.py train_configs/mntp/Llama.json

You will also need to add model identifier here. This hardcoded model names will be removed in future versions of llm2vec

May 07 '24 17:05 vaibhavad

Feel free to re-open if you have any more questions about this issue.

May 09 '24 14:05 vaibhavad

Thank you for your answer, I did based on what you mentioned and the batch size is 4 now, but it seems that distributed training is not activated and only one gpu is being used as:

Process rank: 0, device: cuda:0, n_gpu: 4, distributed training: False, 16-bits training: False

If you think that this is the source of the problem, do you have any idea how I can activate distributed training? I also take a look at here, but no clue.

May 16 '24 08:05 SGidentification

llm2vec llm2vec copied to clipboard

How to run larger models like Llama-2-13B or 70B

llm2vec
llm2vec copied to clipboard