tongye98
Results
2
issues of
tongye98
**Describe the bug** Not a bug, but a suggestion for enhancement. The current solution for multi-gpu training adopts `nn.DataParaller` in joeynmt-2.0. But the flaw of `nn.DataParallel` is obvious and Pytorch...
enhancement
help wanted
利用DeepSeek-Coder-V2-Lite-Base模型进行Lora微调,GPU利用率只能在40%左右,是不是因为是MOE架构的原因?