how to finetune the mistral-moe with expert/data/pipeline parallel?

Open marsggbo opened this issue 2 years ago • 0 comments

it seems that the provided code is based on a single GPU. Any tutorials for finetuning mistral-moe with expert/data/pipeline parallel?

Jan 11 '24 09:01 marsggbo