lhs1012
lhs1012
> It is recommended to use exl2, gptq or awq over gguf. The support for gguf (especially sharded gguf) is unfinished. Oh I see. Thank you for the reply!
@sgsdxzy Thank you for the update. I tried to test 'dev' branch but while the document says The dev branch extends support for GGUF to all available model architectures besides...
It succeeded in converting but I got this error when running the model aphrodite run /mnt3/.cache/huggingface/hub/models--command-r-plus-gguf -tp 2 WARNING: gguf quantization is not fully optimized yet. The speed can be...
still the same error with v0.5.3 and also the current main branch