Gavin Li comments

Results 38 comments of


                                            Gavin Li

ValueError: max() arg is an empty sequence(Apple M2 Max, macOS 14.2.1)

mac version doesn't support QWen yet. Only support Llama/Llama2 series models. we'll add the support later.

外推训练时max-position-embeddings需要做调整吗？

We can do it. Will add it.

Error on first run

Can you please provide the whole source code file?

More clever batching of layers

I'll try, but my understanding is the bottleneck is not there. Current bottleneck is the model loading from disk -> GPU mem part. Batching more layers most likely won't help.

More clever batching of layers

> torch.cuda.synchronize() Great job. Yes. I'll fix the profiling and look into a few possible improvements.

More clever batching of layers

> @lyogavin i tried this out today. I have a suggestion here. What i noticed is the GPU is not utilized fully in this case. For example ![Screenshot 2023-11-30 at...

More clever batching of layers

> Also take a look at this recent blog from pytorch with optimization [strategies](https://pytorch.org/blog/accelerating-generative-ai-2/). > > `Torch.compile allows us to capture a larger region into a single compiled region, and...

pytorch_model.bin.index.json should exists.

Can you provide more info? Which hf model repo ID are you using? Also, can you check if you have enough disk space?

pytorch_model.bin.index.json should exists.

OK... It's a LORA model... We'll look into how to support this. Thanks.