Simon Mo issues

Results 57 issues of


                                            Simon Mo

Regression in support of customized "role" in OpenAI compatible API (v.0.4.2)

### Discussed in https://github.com/vllm-project/vllm/discussions/4745 Originally posted by **tanliboy** May 10, 2024 Hi vLLM team, We have been using vLLM for serving models, and it went really well. We have been...

good first issue

Add `vllm serve` to wrap `vllm.entrypoints.openai.api_server`

Easier to type. It will be now ``` (base) xmo@simon-devbox:~/vllm$ vllm serve --help usage: vllm serve [options] positional arguments: model The model tag to serve options: -h, --help show this...

action-required

Benchmark: add H100 suite

I have recently added an H100 agent which will be online for 12 hours per day. Let's test it out. Successful build: https://buildkite.com/vllm/performance-benchmark/builds/4258

perf-benchmarks

[Roadmap] vLLM Roadmap Q3 2024

### Anything you want to discuss about vllm. This document includes the features in vLLM's roadmap for Q3 2024. Please feel free to discuss and contribute, as this roadmap is...

[RFC]: Performance Roadmap

### Anything you want to discuss about vllm. This is a meta RFC tracking some of the performance enhancement works we are prioritizing. - [ ] https://github.com/vllm-project/vllm/issues/6797 - [ ]...

RFC

v0.6.1.post1 Release Tracker

### Anything you want to discuss about vllm. - [x] #8390 - [x] #8375 - [x] #8399 - [ ] #8417 - [x] #8415 - [x] #8376 - [x] #8425

release

vLLM's V2 Engine Architecture

This issues describes the high level directions that "create LLM Engine V2". We want the design to be as transparent as possible and created this issue to track progress and...

RFC

[Roadmap] vLLM Roadmap Q4 2024

This page is accessible via [roadmap.vllm.ai](https://roadmap.vllm.ai) ### Themes. As before, we categorized our roadmap into 6 broad themes: broad model support, wide hardware coverage, state of the art performance optimization,...

Llama3.2 Vision Model: Guides and Issues

Running the server (using the vLLM CLI or our [docker image](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html)): * `vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16` * `vllm serve meta-llama/Llama-3.2-90B-Vision-Instruct --enforce-eager --max-num-seqs 32 --tensor-parallel-size 8` Currently: * Only...

stale

[V1] Feedback Thread

Please leave comments here about your usage of V1, does it work? does it not work? which feature do you need in order to adopt it? any bugs? For bug...