Jiarui Fang(方佳瑞)

Results 220 comments of Jiarui Fang(方佳瑞)

@shadow150519 Very thankful for pointing out the version incompatibility. Could you please give a Merge Request to this repo? I wish to recognize your contribution here.

Could you please refer to this PR in FlagScale, which is a framework built based on Megatron-LM. https://github.com/FlagOpen/FlagScale/pull/156

是的,gbs=8,序列长度4K。我们可以用DP,沿着gbs维度切分,也可以用SP沿着序列维度切分。这个图在比较这两种切分的优劣。

参考论文中那张通信分析的Table。SP就是比DP通信量更多。

All2All needs some tmp buffer for async P2P. could you post the memory difference? It is very small according to my experience.

Yes, you just add the NPU spec in this dict. https://github.com/feifeibear/LLMRoofline/blob/main/hardware_dict.py

Thank you both for your careful observation; these details are very helpful. I suggest whether it is possible to change the comparison of two float values with '==' to the...

Additionally, I strongly recommend using yunchang's USP, which is a hybrid parallelism approach combining ring and Ulysses. This not only achieves higher training TFlops but also simplifies your code, requiring...

> Hello, thank you for your appreciation of our work. We also noticed USP and tried to utilize it in our experiments. However, we consistently encountered library compatibility issues and...

@Lins-01 Thanks for your contributions for veRL! I noticed some of the code appears to be referenced from the following projects: [fufankeji/fufan-chat-api](https://github.com/fufankeji/fufan-chat-api) encoder.py [RUC-NLPIR/FlashRAG ](https://github.com/RUC-NLPIR/FlashRAG)utils.py Could you please to check...