Jeevan issues

Results 5 issues of


                                            Jeevan

AutoGPTQ not working with HuggingFace accelerate (multi GPU)

If I run the following command: ``` accelerate launch -m lm_eval --model hf --model_args "pretrained=TheBloke/Llama-2-7B-Chat-GPTQ,gptq=True,load_in_4bit=True" --tasks "arc_challenge" --num_fewshot 25 --batch_size auto ``` I get the following error: ``` ValueError: You...

bug

[BUG] OOM when num_samples > 128 - Llama7B on RTX 4090.

**Describe the bug** I am trying to run autoGPTQ of Llama-7B on RTX 4090 with num_samples > 128 but go OOM. I thought that the number of samples would not...

bug

Does GPTQ quantize the final linear layer (that projects to vocab)?

I have loaded this model `TheBloke/Llama-2-7B-Chat-GPTQ` from HuggingFace and the final linear layer (`lm_head`) is a standard linear layer. Is there a way to quantize it? Would performance drastically decrease?

Llama3-8b FP8 PTQ OOM

**Describe the bug** Running FP8 PTQ of Llama3-8b on 1x 4090 (24GB) goes OOM? Is this expected? vLLM FP8 quantization works on the same GPU. What are the minimum requirements...

bug

[BUG] Transformer Regression: Llama3.1 - tensors on two devices

**Describe the bug** I am trying to quantize Llama3.1 using GPTQ but encounter an error where tensors are on CPU and GPU. But this used to work for Llama3 on...

bug