EasyLM icon indicating copy to clipboard operation
EasyLM copied to clipboard

Use EasyLM to pre-train llama-7B using Nvidia GPU

Open zhpacer opened this issue 1 year ago • 2 comments

Do we have the training script to pre-train a llama-7B model using GPU such as A100? Current examples are based on TPU. Don't know if there are some difference. thanks.

zhpacer avatar Jul 24 '23 02:07 zhpacer

I believe the configuration would be very similar, although you might need to tune the mesh dimensions according to your cluster configuration and network topology to get the best performance. Specifically, you'll want to add these options when training on GPUs in a multihost environment:

python -m EasyLM.models.llama.llama_train \
    --jax_distributed.initialize_jax_distributed=True \
    --jax_distributed.coordinator_address=<your coordinator (process 0) address and port> \
    --jax_distributed.num_processes=<total number of processes (hosts)> \
    --jax_distributed.process_id=<current process id>

young-geng avatar Jul 24 '23 08:07 young-geng

Great thanks, I will have a try

zhpacer avatar Jul 24 '23 10:07 zhpacer