AlpinDale
AlpinDale
Whoops sorry misinput.
Please use the Dockerfile.cpu, or follow the steps outlined within the dockerfile. The docs aren't updated. I'll make a note to fix this. Thanks!
Numpy isn't an issue, that's mostly a benign error. The real issue is here: ```console -- Configuring done (17.7s) CMake Error: The following variables are used in this project, but...
I've noticed the same, unfortunately. I've been looking into alternatives, probably bring back the old kernels as an optional method to use for lora ops, maybe toggled with env variables...
Solved by doing `--max-seq-len-to-capture $MAX_MODEL_LEN`, as discussed offline.
Can you limit this PR to just the Metrics fix? Block Manager V1 is going to be deprecated as of #1300 so the other part of this PR will conflict...
The primary culprit was the upstream PR vllm-project/vllm#3977, which drastically changed how quantized layers were handled. This made working with exllamav2 extremely difficult. If someone can make the existing exl2...
As of #1215 and #1216 this issue should be fixed. Also, your first command is incorrect @cassettesgoboom. You can't load a GGUF model in another quant format (deepspeefp in your...
The config format has changed between the releases. You can now use a config file by doing `aphrodite run --config config.yaml`
Thanks for reporting. Seems to be an issue with the FusedMoE triton kernels. I will investigate and see what I can come up with.