mistral-inference
mistral-inference copied to clipboard
how to finetune the mistral-moe with expert/data/pipeline parallel?
it seems that the provided code is based on a single GPU. Any tutorials for finetuning mistral-moe with expert/data/pipeline parallel?