AlpinDale comments

Results 170 comments of


                                            AlpinDale

[Bug]: WSL Cuda out of Memory when Trying to Load GGUF Model

Sorry I've been away for a while. Have you tried the docker image? This is probably a WSL issue. GPU docker on windows uses WSL too, but who knows...

[Feature]: any workarounds for cc 6.0?

I'll be looking into this before the 0.5.3 release. Should be doable.

[Bug]: multi GPU crashes backend

The error log isn't very helpful. It may give you more info if you kill the server (async moment). It could be due an internal timeout, but hard to tell...

[Bug]: multi GPU crashes backend

Seems like a timeout error. Did you have a sequence that took longer than 60 seconds to process? As a hotfix, you can increase the timeout threshold: ```sh export APHRODITE_ENGINE_ITERATION_TIMEOUT_S=120...

[Bug]: multi GPU crashes backend

Most issues fixed with v0.6.0

[Feature]: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

I doubt this applies to inference as much as it does for training. Admittedly, I haven't given the paper a thorough read yet.

[Bug]: Outlines json guided decoding

Hi sorry, I totally missed this issue! Can you run the docker in privileged mode?

[Bug]: Outlines json guided decoding

We already resolved a similar issue related to triton - it should be fixed in the latest docker. Have you tried it?

Bad generation with GGUF and OpenAI api

Can confirm this happens with mixtral. Investigating.

[Misc]: Building docker container requires insane amount of memory

Where did you set the MAX_JOBS variable? It should be set in the Dockerfile right before the build command towards the end.