Yu-won Lee comments

Results 230 comments of


                                            Yu-won Lee

multigpu에서 안되는 문제가 있습니다.

위에 올려주신 사용중이신 bash 파일에 offloading 사용중이셔서 제안드렸었습니다. 환경을 새로 다시 구성해서 해보겠습니다.

Installing MS-AMP

Well I've changed the server and it shows the same error now. I'm a bit confused whats the problem. I was just using the docker with older pytorch version such...

Minimum GPU memory need for fine-tuning

My code isn't for unsloth but, I've added liger-kernel training. I didn't tested the minimum vram but, I've used at 5 x V100 gpus with cpu offloading. I'll try to...

Minimum GPU memory need for fine-tuning

@linkfzaro Liger-kernel is similar to unsloth that is optimized with tirotn kernel. To the best to my knowledge, unsloth is a framework that optimizes the model with triton kernel. That...

Minimum GPU memory need for fine-tuning

@Oguzhanercan You could comment the line using liger_kernel cuz, it dosen't support qlora for it. https://github.com/2U1/Llama3.2-Vision-Finetune/blob/9c6821e95a6e962600ecc654b7e545d4a3dd316a/src/training/train.py#L71C5-L71C33 This is the line. l'll make a option for turning it on and off....

Minimum GPU memory need for fine-tuning

@Oguzhanercan Offloading takes much longer than non-offloading. I'll find some other way to improve the speed.

UserWarning: None of the inputs have requires_grad=True. Gradients will be None

It's odd that I enabled to grad for input embedding. It is caused by gradient checkpointing. I'll check for this.

UserWarning: None of the inputs have requires_grad=True. Gradients will be None

@gumona12 Okay I found the reason. Some the gradietns are set to False when you freeze the vision tower using gradient checkpointing. It doesn't matter when training, so it just...

i try to start train with bash scripts/finetune.sh

I think the version of your cuda and the environment is a bit different. Can you reinstall the torch to your current cuda version and retry?

i try to start train with bash scripts/finetune.sh

The error code says you have cuda 12.0. Thats odd. Think it's some kinda version issue. I'll try to get it.