KoboldAI
KoboldAI copied to clipboard
GPTQ models load much slower than 0cc4m's fork used to
The slow loading happens every time a new chat message is being generated, it's particularly noticeable with 30b models but also noticeable with 13b.
There are currently two issues going on:
- Occam's GPTQ version is not properly compatible with newer huggingface builds so you are falling back to AutoGPTQ in more cases.
- AutoGPTQ itself has an outstanding issue where this happens on some systems which they still need to address.
So on the Kobold side we are waiting for either GPTQ package to get the needed updates before the full speed is restored.