Yu-won Lee
Yu-won Lee
@MahmoudElsayedMahmoud It might be a problem with gcc. https://github.com/microsoft/DeepSpeed/issues/4257 You could follow the solution here.
I am aware of the issue. I'll try to address it. Thanks.
I'm not availavble with my server now, So may I ask you that zero2 dose not work with mixed modality training too? It works with zero2 for me.
@haon-chen I thought of adding dummy tensor but, when adding that I think the model code should be fixed for only creating the activation flow for the dummy and not...
@haon-chen Sorry for the late reply. It's good idea for making the `cross_attention_mask` to all zeros. It can work similar in other VLMs. I'll workaround with it soon.
I've updated the code for supporting mixed-modality data. There were some other issues when making the dataset, so I've fixed it together. I think it should work for now.
@haon-chen Thanks for the great job!
The error could caused by various things. Does the error occurs every time? I've tested with a simple data only but I haven't seen that error yet.
@Tcc0403 Thank you for providing support for Qwen3-VL. I have a question specifically regarding the interaction between Liger Kernel and DeepSpeed ZeRO. After running several experiments, I noticed that: -...
Thanks for the clarification. Sorry for the confusion — I was referring specifically to Qwen3-VL. Assuming that DeepSpeed is always enabled in my setup, the issue I’m seeing is that...