AlpinDale
AlpinDale
Sorry I've been away for a while. Have you tried the docker image? This is probably a WSL issue. GPU docker on windows uses WSL too, but who knows...
I'll be looking into this before the 0.5.3 release. Should be doable.
The error log isn't very helpful. It may give you more info if you kill the server (async moment). It could be due an internal timeout, but hard to tell...
Seems like a timeout error. Did you have a sequence that took longer than 60 seconds to process? As a hotfix, you can increase the timeout threshold: ```sh export APHRODITE_ENGINE_ITERATION_TIMEOUT_S=120...
Most issues fixed with v0.6.0
[Feature]: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
I doubt this applies to inference as much as it does for training. Admittedly, I haven't given the paper a thorough read yet.
Hi sorry, I totally missed this issue! Can you run the docker in privileged mode?
We already resolved a similar issue related to triton - it should be fixed in the latest docker. Have you tried it?
Can confirm this happens with mixtral. Investigating.
Where did you set the MAX_JOBS variable? It should be set in the Dockerfile right before the build command towards the end.