Simon Mo

Results 57 issues of Simon Mo

### 🚀 The feature, motivation and pitch While existing Outline state machine provide great state of the art performance, it is trading off a one-off compile time when working with...

feature request

### 🚀 The feature, motivation and pitch We currently do not apply chat template for the offline `LLM` class. It might be useful to provide similar interface as Huggingface chat...

feature request

### 🚀 The feature, motivation and pitch Thanks to our amazing community, we have gathered a set of good chat template for models. These template are useful when the original...

feature request

### 🚀 The feature, motivation and pitch Recently outlines updated their interface from FSM to Guide to support "acceleration"/"fast-forward" which will output next sets of tokens if they are directly...

feature request

### 🚀 The feature, motivation and pitch #2888 added a prototype for AI Controller Interface, which is a WASM based runtime for guided generation. We would like to integrate this...

feature request

### Anything you want to discuss about vllm. Current we do not run model test on A100 machine because we can't get any capacity in GCP. https://skypilot.readthedocs.io/ supports runpod and...

misc

After #1662 (initial metrics support) and #1756 (refactoring chat endpoint), it will become practical to include latency metrics that's important to production (courtesy of @Yard1): * histogram of time to...

help wanted
good first issue

Also aliased as `functions` and `function_call` in deprecated parameters. https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools After #1756 is merged (thanks @Tostino!), it should be straightforward to add this as a core parameter to OpenAI compatible...

help wanted
good first issue

### Anything you want to discuss about vllm. Even though vLLM is type annotated but we did not enable type checking. It would be useful to add it, even incrementally.

misc