Richard Li
Richard Li
Hi, thanks for the bug report! I'm not sure switching to a Makefile versus bash would really simplify things -- my preference would be to fix the root cause. But...
Thanks for the request. Yes, it's definitely possible, and we've discussed providing importers for Swagger, Protobuf, et al. If you have a specific test case or example, we'd love to...
Hi @hello-ashleyintech! Thanks for the prompt response. Yes, I figured all the above out. The other thing that isn't mentioned is how to write your Slack event handler to handle...
Hi @hello-ashleyintech. Thanks for all your help. I've gotten most (?) of this working, but I have one more error I can't figure out. 1. I've created an OAuth settings...
I'm going to close this issue. I do think the documentation for Python can be improved, because there's quite a bit of magic going on behind the scenes that is...
I can trigger this error reliably when sending requests with larger amounts of tokens. I've reproduced this on both `meta-llama/Meta-Llama-3-8B-Instruct` and `mistralai/Mistral-7B-Instruct-v0.1`. In my situation, I'm deploying vLLM on a...
I did some additional experimentation: * On a 64GB VM, CPU only, I was able to successfully trigger the error with a 351 token prompt. * On a 128GB VM,...
I've had some success by increasing `ENGINE_ITERATION_TIMEOUT_S`. It appears the offending code is here: (see https://github.com/vllm-project/vllm/blob/main/vllm/engine/async_llm_engine.py#L630). When the engine takes too long, it times out, but then leaves the engine...
Note that GitHub Copilot Chat is a separate extension https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-chat than the main code completion. I agree it would be super useful; I'm currently looking for a plug-in that does...