Liang Shining

Results 10 comments of Liang Shining

同意楼上老哥的方法,公众号可以用于在各大社区网站做宣传,引流

@YasinZhao @loveJasmine 感谢两位的反馈,近期我会检查一下

> 不过我有个问题,rep主提到的self.attention部分的实现为何我没有找到,难道rep主把attention flow当做了self-attention? 是的是的,在attention flow中加了一种self attention,跟现行的基于transformer的self attention不一样

> Hi, I just published ONNX version with scripts to do the ONNX conversion here: https://huggingface.co/aapot/bge-m3-onnx Thanks for your work. It seems like a cpu version, right?

Hi @hiyouga Are there any merge blockers on this PR? I'm SFT qwen2.5 on a long context task and I think sequence parallel will much help to accelerate it. If...

> @shiningliang This PR diverges from LLaMA-Factory's last release v0.9.1 For now, known errors with SP are with multi-modal data & models. Pure text models should work well. Hi @HaoshengZou...

Same issue when trying Qwen2.5 sequence parallel and fix it by **downgrade transformers to 4.42.4**. Will the owner migrate the code to support different versions of transformers?

Sequence parallel needs transformers

> > hello,训练流程会有tp、pp...的支持吗 > > megatron支持的优化会在ms-swift3.0重构后进行. 大概1-2个月后 今年有希望支持吗 😂