blog icon indicating copy to clipboard operation
blog copied to clipboard

Gpt-neox-20b model take 1 minutes for 100 token using 4 bit quantization.

Open imrankh46 opened this issue 2 years ago • 0 comments

How I can reduce time for more the 100 token. ? The model take 1 minutes for 100 token using model in 4bit quantization.

imrankh46 avatar May 25 '23 12:05 imrankh46