AlpinDale comments

Results 170 comments of


                                            AlpinDale

AQLM CUDA support

Great work, @jaemzfleming. It seems the kernels are too inefficient - it takes 10 minutes to load a 1x16 70b on a 3090, and ~4 minutes for 2x8. Have you...

GPTQ GPT-J support

The code still needs testing before an attempt at implemention is made. I have not tested it yet - I'm not 100% sure I've got the layer names correctly. *Theoretically*...

Can confirm that the GPTQ implementation for the GPT-J 6B model (and any model fine-tuned off of it, such as [Pygmalion 6B](https://huggingface.co/PygmalionAI/pygmalion-6b)) seem to be working perfectly.

Error "Could not find a version that satisfies the requirement jaxlib==0.4.25+cuda12.cudnn89"

Needs either either a TPU or GPU (NVIDIA/AMD only). They have to be 8 devices.

Multi-gpu training example?

> @SparkJiao Sorry but what do you mean by `zero=0`? > > By the way, I just find that removing model.cuda() or model.eval() help me to solve the multiplication error:...

OOM on 2x24GB GPU with a 30B model

Your LoRA rank might be too high (`r = 128`). I wouldn't recommend going above the effective batch size of `1` either, it seems to negatively affect the train loss...

OOM on 2x24GB GPU with a 30B model

> @AlpinDale Is the effective batch size equal to the value of `per_device_train_batch_size`? Effective batch size is equal to `per_device_train_batch_size` * `gradient_accumulation_steps`.

OOM on 2x24GB GPU with a 30B model

@Tostino Yes. Keep in mind though that an effective batch size of 1 results in a *very* slow training time.

CUDA error: out of memory

Currently having this issue as well. The `CUDA_VISIBLE_DEVICES` environment variable has no effect either, and it only loads the models to GPU 0. I'm running on A100s but still get...

CUDA error: out of memory

> @dcruiz01 @SunixLiu @AlpinDale vLLM is designed to take almost all of your GPU memory. Could you double-check your GPU is not used by other processes when using vLLM? Thanks,...