AlpinDale

Results 170 comments of AlpinDale

QuIP# needs more polishing, even as of v0.6.0. Bump here so I can work on it again for the next release.

You can try FP8 KV cache or use chunked prefill (mutually exclusive for now). `--kv-cache-dtype fp8` | `--enable-chunked-prefill`

Seems like your aphrodite installation isn't being recognized in the environment; ![image](https://github.com/PygmalionAI/aphrodite-engine/assets/52078762/2929a2b6-5b42-43ef-b3b7-70ddd52a7d3e)

Can you add it to tests/weight_loading/models.txt too? Thanks

Running this PR with the latest main branch merged gives this error (tensor_parallel_size=2): ``` File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 156, in forward q, k = self._apply_qk_norm(q, k) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 138,...

Try removing the `--api-keys` arg. Setting up the Kobold UI with an api key is more involved.

This seems to happen with some specific models, I'll investigate soon. Sorry for not getting back to you sooner!

Hi @Isotr0py, I can help with this PR if needed. I've already done some work implementing all GGUF quants + related kernels in vLLM. Let me know if you'd like...