Aaron Pham
Aaron Pham
Can you not try with conda? just virtualenv?
afaik there is a pretty easy way to do this manually. (IIRC there are also threads in the discord abt ppl doing this)
Hi, please provide a minimal config that only contains relevant info for avante.nvim. Dumping your whole config won't help here.
No need to, on https://docs.vllm.ai/en/latest/cli/index.html we mentioned for `--help`, which you already include the helpstring for it.
I don't think LoRA should be captured in CUDA Graph, especially in the case you might want to switch multiple different loras. What is the behaviour that you observed with...
> the total model will not be captured by well, yes, that's the behaviour of `enforce_eager=True`... correct me if i'm wrong, but you shouldn't capture the cuda graph for LoRA?
It is supported in https://github.com/vllm-project/vllm/pull/14626 Can you try again with 0.8.0?
From Slack, it seems to be both supported for v0 and v1 once you upgrade to latest vllm.
Seems like to me a node setup problem?