Inference code for 2 models evaluated in the paper
Hello, thank you for releasing such great codebase. In tutel/fairseq_moe I can see you used a moe model in fairseq repository for training. However, I didn't find the corresponding inference code. Also, I noticed that SwinV2-MoE was evaluated in the paper. Could you provide the inference code for these 2 models(under Tutel framework)?
Hi, SwinMoE has a MoE-pretrained version following this instructions: https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md#evaluating-swin-moe
If you have different number of GPUs to fine-tune / interference based on the pretrained, you can further use this tool to convert the checkpointing files: https://github.com/microsoft/Tutel/blob/main/doc/CHECKPOINT.md#swin-transmformer-maintains-a-special-checkpoint-format-how-to-convert-swin-transformer-checkpoint-files-for-different-distributed-world-sizes
For Fairseq, this is the step to use it with Fairseq: https://github.com/microsoft/Tutel/tree/main/tutel/examples/fairseq_moe
Thank you for your quick reply. As far as I know, https://github.com/microsoft/Tutel/tree/main/tutel/examples/fairseq_moe only contains the training code for Fairseq. Do you have the inference code for me to test easily?
Inference settings need to follow Fairseq's doc, like options for --forward-only