blog
blog copied to clipboard
Gpt-neox-20b model take 1 minutes for 100 token using 4 bit quantization.
How I can reduce time for more the 100 token. ? The model take 1 minutes for 100 token using model in 4bit quantization.