SpQR issues

Will the function of model saving be realized in the future?

2

Will the function of model saving be realized in the future?

[ptb perplexity is different from paper]

9

Hi, Thank you for sharing your work. The re-produced perplexity for ptb dataset using your code is not matched with the paper. The reproduced is 27.8, while in the paper...

Amshaker

Any update on inference code?

Anxiously excited about getting access to the inference code!

nate-sanders

Which dataset should I use?

1

Hello, I have a question, I currently have a model of the llama series that has been fine-tuned with my own dataset. If I want to SpQR quantize it, do...

ccccj

has the inference code released?

failed to find the inference code

singingtower

How to test inference speed?

4

Anyway we can test the speedup effect?

JianbangZ

CUDA out of memory falcon-40b when using 40Gi A100 GPU

1

Been trying to run quantization for falcon-40b on a box with 8 40Gi A100's but I keep getting CUDA memory errors. The readme states that this should be possible, unless...

caleb-artifact

Process killed after eval phase

4

I did try running the code from your repository; however, when I attempted to add the --save_safetensors feature, the process was interrupted after performing the evaluation. I didn't encounter any...

Iambestfeed

Post Quantization for nllb-models

1

Hi @Vahe1994, I have fine-tuned a facebook's nllb model on my custom dataset for language translation. Could you provide a guideline on how to preform SpQR of this fine-tuned model?...

Arnab1181412

Provide SpQR trained model weights on OpenLLaMA?

1

Hi I was wondering if you folks can provide SpQR trained model weights on OpenLLaMA? OpenLLaMA has Apache-2.0 and has reported closer to the original LLaMA’s performance on benchmarks. Thanks....

jayxsinha

SpQR
SpQR copied to clipboard

Metadata

Will the function of model saving be realized in the future?

[ptb perplexity is different from paper]

Any update on inference code?

Which dataset should I use?

has the inference code released?

How to test inference speed?

CUDA out of memory falcon-40b when using 40Gi A100 GPU

Process killed after eval phase

Post Quantization for nllb-models

Provide SpQR trained model weights on OpenLLaMA?

← Metadata

Owner

Metadata

SpQR SpQR copied to clipboard

Metadata

← Metadata

Owner

Metadata

SpQR
SpQR copied to clipboard