Gavin Li

Results 38 comments of Gavin Li

mac version doesn't support QWen yet. Only support Llama/Llama2 series models. we'll add the support later.

can you provide a stack trace?

Can you please provide the whole source code file?

I'll try, but my understanding is the bottleneck is not there. Current bottleneck is the model loading from disk -> GPU mem part. Batching more layers most likely won't help.

> torch.cuda.synchronize() Great job. Yes. I'll fix the profiling and look into a few possible improvements.

> @lyogavin i tried this out today. I have a suggestion here. What i noticed is the GPU is not utilized fully in this case. For example ![Screenshot 2023-11-30 at...

> Also take a look at this recent blog from pytorch with optimization [strategies](https://pytorch.org/blog/accelerating-generative-ai-2/). > > `Torch.compile allows us to capture a larger region into a single compiled region, and...

Can you provide more info? Which hf model repo ID are you using? Also, can you check if you have enough disk space?

OK... It's a LORA model... We'll look into how to support this. Thanks.