Yu-won Lee comments

Results 230 comments of


                                            Yu-won Lee

Why Is Special Handling Required for LoRA with ZeRO Stage 3?

@baiyongrui I was using Copilot at the time, and it seems the autocomplete set it to False. I've fixed it right away, but it's unfortunate this issue has existed for...

GPU seems not used

It seems like you are using cpu offloading. That slows down the training quite much. You could adjust the settings in the deepspeed config file to put some layers back...

GPU seems not used

The settings written in the config files are not optimal for everyone. The less vram you are using, you will take more time to train. You should balance between 2...

Customize my own model

If the original model is the same speed, that would be caused by some other problem maybe caused by other thing such as offloading, long context due to the image...

Customize my own model

Does the original transformer forward is faster than the monkey patched one? If so, you could just use the original one. It's just for applying the liger-kernel and training mixed-modality...

Customize my own model

Yes, but I think it's that's not the problem for the significant slow down, I'll find a way to intergrate `torch.compile` in my code. Thanks for letting me know!

Customize my own model

I don't exactly know what you are doing, so I have no idea what is causing the error. BTW, I've implemented torch.compile but it dosen't work it flash-attention2. So I...

Customize my own model

You mean adding projectors in between the vision-encoder and llm? Then I think that could make an huge error, becuase it dosen't project the image_embeddings properly at the first. The...

Customize my own model

Hmm, maybe the monkey patching the liger-kernel modules could be a problem. You could try erasing those.

Customize my own model

Sorry for the late response. If you aren't using LoRA it won't be the original one. I'll check out for it.