John Robinson

Results 18 comments of John Robinson

I'm comparing flan-t5-xxl int8 quant using this script... (will fit on a 24G card) https://github.com/johnrobinsn/flan_ul2/blob/main/infer-flan-ul2-int8-basemodel.py against flan-t5-xxl int4 using this script... https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/t5/t5_inference.py and y... pretty bad results with int4... in...

@jasontian6666 by flan-eval you're talking about this..? thx https://github.com/declare-lab/flan-eval

they have a --load_8bit flag that I'm trying now with flan-t5-xxl

Just collecting some details on the performance between int8 quant vs int4 quant for **t5 models** (not llama) Using [flan-eval](https://github.com/declare-lab/flan-eval) to eval **int8 quant** performance int8;flan-t5-xxl;mmlu => Average accuracy: 0.544...

The flan models are already finetuned... so even before additional finetuning... seeing the degradation as captured above. @bradfox2 But that's a good thought; increasing the sampling rate etc, might improve...

@jasontian6666 @bradfox2 watching this https://github.com/qwopqwop200/GPTQ-for-LLaMa/pull/189 might help t5

@bradfox2 y... I would like to, although might take me a week or so to circle back to it. will need to merged into the t5 branch...

Also let me know if there is a better spot to discuss topics like this...