Simon Mo comments

Results 313 comments of


                                            Simon Mo

OpenAI Tools / function calling v2

Can we go further on reducing the templating to purely JSON schema? I believe it is possible by framing it as { "tool_choice": one of the tool name "tool_params": constrained...

Aborted request without reason

cc @njhill

Incremental output for LLM entrypoint

I think the original idea is openai style API also have `stream` flag that change the behavior of the output

No response from OpenAI Chat API with vLLM

Which hardware are you using? It looks like after processing the prompt, there's very little free space left for computing the generation tokens. See (`# GPU blocks: 37`). Maybe consider...

No response from OpenAI Chat API with vLLM

Yeah it does look like two T4 gives you 32G GPU memory. The 13B model takes about 26G in parameters, which leaves every little for KV Cache. Maybe use just...

Disable cuda version check in vllm-openai image

sorry i just merged the other PR, can you resolve the conflict?

Disable cuda version check in vllm-openai image

🤦‍♂️ sorry another conflict

[CI/Build] A perplexity-computing test for the FP8 KV cache system. Originally used in the context of PR #3290

Sounds good. I agree with @casper-hansen that this is very valuable and a good start for #3780

[CI/Build] A perplexity-computing test for the FP8 KV cache system. Originally used in the context of PR #3290

At a high level I would imagine running more end to end test like https://github.com/EleutherAI/lm-evaluation-harness which can directly support vLLM with simpler command should be better? For actual testing I...

[CI] Create nightly images/wheels

This is a different task. @youkaichao can you create a new issue tracking it?