AlpinDale
AlpinDale
QuIP# needs more polishing, even as of v0.6.0. Bump here so I can work on it again for the next release.
`--swap-space` should handle this
Bump so I can explore this for our next release.
You can try FP8 KV cache or use chunked prefill (mutually exclusive for now). `--kv-cache-dtype fp8` | `--enable-chunked-prefill`
Seems like your aphrodite installation isn't being recognized in the environment; 
Can you add it to tests/weight_loading/models.txt too? Thanks
Running this PR with the latest main branch merged gives this error (tensor_parallel_size=2): ``` File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 156, in forward q, k = self._apply_qk_norm(q, k) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 138,...
Try removing the `--api-keys` arg. Setting up the Kobold UI with an api key is more involved.
This seems to happen with some specific models, I'll investigate soon. Sorry for not getting back to you sooner!
Hi @Isotr0py, I can help with this PR if needed. I've already done some work implementing all GGUF quants + related kernels in vLLM. Let me know if you'd like...