EasyLM Use EasyLM to pre-train llama-7B using Nvidia GPU

Use EasyLM to pre-train llama-7B using Nvidia GPU

Open zhpacer opened this issue 1 year ago • 2 comments

Do we have the training script to pre-train a llama-7B model using GPU such as A100? Current examples are based on TPU. Don't know if there are some difference. thanks.

Jul 24 '23 02:07 zhpacer

I believe the configuration would be very similar, although you might need to tune the mesh dimensions according to your cluster configuration and network topology to get the best performance. Specifically, you'll want to add these options when training on GPUs in a multihost environment:

python -m EasyLM.models.llama.llama_train \
    --jax_distributed.initialize_jax_distributed=True \
    --jax_distributed.coordinator_address=<your coordinator (process 0) address and port> \
    --jax_distributed.num_processes=<total number of processes (hosts)> \
    --jax_distributed.process_id=<current process id>

Jul 24 '23 08:07 young-geng

Great thanks, I will have a try

Jul 24 '23 10:07 zhpacer

EasyLM EasyLM copied to clipboard

Use EasyLM to pre-train llama-7B using Nvidia GPU

EasyLM
EasyLM copied to clipboard