Swin-Transformer
Swin-Transformer copied to clipboard
Load single-GPU trained parameters to multi-GPU inference (SwinMoE)
Hello.
I'm trying to upload a single-GPU trained SwinMoE model to multi-GPU (4) inference.
I'm adopting 8 experts for an MoE layer.
It seems that my checkpoint file has all parameters for 8 MoE layers at once, but each GPU requires 2 MoE layers' parameters, so there's a mismatch.
Is there any way to fix it? or just re-training for 4 GPUs?
Here's the code.
Traceback (most recent call last):
File "main_moe.py", line 367, in <module>
main(config)
File "main_moe.py", line 139, in main
max_accuracy = load_checkpoint(config, model_without_ddp, optimizer, lr_scheduler, loss_scaler, logger)
File "/root/Swin-Transformer_ranggi/utils_moe.py", line 44, in load_checkpoint
msg = model.load_state_dict(checkpoint['model'], strict=False)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1370, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SwinTransformerMoE:
size mismatch for layers.2.blocks.1.mlp._moe_layer.experts.batched_fc1_w: copying a param with shape torch.Size([8, 2048, 512]) from checkpoint, the shape in current model is torch.Size([2, 2048, 512]).