Yu-won Lee

Results 230 comments of Yu-won Lee

@MahmoudElsayedMahmoud It might be a problem with gcc. https://github.com/microsoft/DeepSpeed/issues/4257 You could follow the solution here.

I am aware of the issue. I'll try to address it. Thanks.

I'm not availavble with my server now, So may I ask you that zero2 dose not work with mixed modality training too? It works with zero2 for me.

@haon-chen I thought of adding dummy tensor but, when adding that I think the model code should be fixed for only creating the activation flow for the dummy and not...

@haon-chen Sorry for the late reply. It's good idea for making the `cross_attention_mask` to all zeros. It can work similar in other VLMs. I'll workaround with it soon.

I've updated the code for supporting mixed-modality data. There were some other issues when making the dataset, so I've fixed it together. I think it should work for now.

@haon-chen Thanks for the great job!

The error could caused by various things. Does the error occurs every time? I've tested with a simple data only but I haven't seen that error yet.

@Tcc0403 Thank you for providing support for Qwen3-VL. I have a question specifically regarding the interaction between Liger Kernel and DeepSpeed ZeRO. After running several experiments, I noticed that: -...

Thanks for the clarification. Sorry for the confusion — I was referring specifically to Qwen3-VL. Assuming that DeepSpeed is always enabled in my setup, the issue I’m seeing is that...