tongye98

Results 2 issues of tongye98

**Describe the bug** Not a bug, but a suggestion for enhancement. The current solution for multi-gpu training adopts `nn.DataParaller` in joeynmt-2.0. But the flaw of `nn.DataParallel` is obvious and Pytorch...

enhancement
help wanted

利用DeepSeek-Coder-V2-Lite-Base模型进行Lora微调,GPU利用率只能在40%左右,是不是因为是MOE架构的原因?