AlpinDale comments

Results 170 comments of


                                            AlpinDale

[Bug]: Segmentation fault (core dumped)

QuIP# needs more polishing, even as of v0.6.0. Bump here so I can work on it again for the next release.

[Installation]: Docker runs out of CPU swap size on 8 GPUs. How to lower swap_space to be less than 4GB per GPU?

`--swap-space` should handle this

[Feature]: An alternative to `max_tokens` which defaults to `minimum(max_tokens, remaining_tokens)`

Bump so I can explore this for our next release.

[Usage]: Higher Context Length.

You can try FP8 KV cache or use chunked prefill (mutually exclusive for now). `--kv-cache-dtype fp8` | `--enable-chunked-prefill`

[Bug]: Running aphrodite throws ImportError

Seems like your aphrodite installation isn't being recognized in the environment; ![image](https://github.com/PygmalionAI/aphrodite-engine/assets/52078762/2929a2b6-5b42-43ef-b3b7-70ddd52a7d3e)

Add Olmo2

Can you add it to tests/weight_loading/models.txt too? Thanks

Running this PR with the latest main branch merged gives this error (tensor_parallel_size=2): ``` File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 156, in forward q, k = self._apply_qk_norm(q, k) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 138,...

[Bug]: Error at Custom KoboldAI Endpoint! The custom endpoint failed to respond correctly. You may wish to try a different URL or API type.

Try removing the `--api-keys` arg. Setting up the Kobold UI with an api key is more involved.

[Bug]: LogProb crashes the server

This seems to happen with some specific models, I'll investigate soon. Sorry for not getting back to you sooner!

[Core] Support loading GGUF model

Hi @Isotr0py, I can help with this PR if needed. I've already done some work implementing all GGUF quants + related kernels in vLLM. Let me know if you'd like...