zero-1 topic
List
zero-1 repositories
pipegoose
74
Stars
17
Forks
Watchers
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*