cgoe comments

Results 15 comments of


                                            cgoe

How does mamba support cross attention?

@Zeno673 Hello, we evaluate serving the mamba as bidirectional multi-modal encoder in our recent work: [video-mamba-suite](https://github.com/OpenGVLab/video-mamba-suite). We find that concatentating directly textual and visual tokens can effectively perform cross-modal interaction,...

[Help requested] Inference InternVideo2_clip model.

> @cg1177, could you summon @JustinYuu one more time? :DD OK

Numerical errors in backward

Do you have any idea how to implement it?

What does mean the figure 5?

按理来说，dp在一开始就切分了，backward的时候才all gather。SP 需要在attention的时候online的通信切分。按理说SP肯定会比DP慢一些？我的理解对吗？而且似乎随着节点数量增加，SP相对于DP的效率好像是会越来越低？

[Potential Bug] KL compute in `low_var_kl (Cause KL NaN and !!!!!!!!!!!!!!! output)

> 我用vllm 0.8.2 + 2节点跑的时候也遇到这个问题，经人提醒，将vllm的v1引擎关掉之后，一切正常了。具体操作是，设置export VLLM_USE_V1=0，并且训练参数里面不要使用设置 actor_rollout_ref.rollout.enforce_eager=False actor_rollout_ref.rollout.free_cache_engine=False 就可以了请问这个是修复了什么问题？我现在是一直到平台，grad norm 会变成nan