llm2vec
llm2vec copied to clipboard
How to run larger models like Llama-2-13B or 70B
Thank you for sharing your codes, I was wondering how we can apply LLM2Vec for larger LLMs such as 13B or 70B?
Yes it is definitely possible, all that will be required is to change the model name and the batch size in the config.
For example, for Llama 13B, you should change model_name_or_path
from meta-llama/Llama-2-7b-chat-hf
to meta-llama/Llama-2-13b-chat-hf
. The per_device_train_batch_size
and per_device_eval_batch_size
should also be adjusted accordingly.
After that the same training code should work
python experiments/run_mntp.py train_configs/mntp/Llama.json
You will also need to add model identifier here. This hardcoded model names will be removed in future versions of llm2vec
Feel free to re-open if you have any more questions about this issue.
Thank you for your answer, I did based on what you mentioned and the batch size is 4 now, but it seems that distributed training is not activated and only one gpu is being used as:
Process rank: 0, device: cuda:0, n_gpu: 4, distributed training: False, 16-bits training: False
If you think that this is the source of the problem, do you have any idea how I can activate distributed training? I also take a look at here, but no clue.