XLzed
XLzed
**Describe the bug** Context parallel does not work in some cases, such as pretrain llama-34b with 64 A800 GPUs and seqlen>=32768. **But using megatron-lm directly has no problem with the...
**Describe the bug** Some data is lost during transmission,it causes the exception of grpc http2 deframe, and netty benchmark example hangs because of waiting for all data. **Steps to Reproduce**...
### Issue When testing the performance of DeepSeek-v2-lite using Megatron and TransformerEngine, I encountered an issue where GroupedLinear exhibits unusually high duration. The TEGroupedLinear forward operation typically takes about 1ms...
**Describe the bug** While using chckpoint recume in mcore-0.12, it cannot produce same loss value in the first forward, and the subsequent loss deviation is relatively large. But mcore-0.11 can...