Simon Mo
Simon Mo
### Discussed in https://github.com/vllm-project/vllm/discussions/4745 Originally posted by **tanliboy** May 10, 2024 Hi vLLM team, We have been using vLLM for serving models, and it went really well. We have been...
Easier to type. It will be now ``` (base) xmo@simon-devbox:~/vllm$ vllm serve --help usage: vllm serve [options] positional arguments: model The model tag to serve options: -h, --help show this...
I have recently added an H100 agent which will be online for 12 hours per day. Let's test it out. Successful build: https://buildkite.com/vllm/performance-benchmark/builds/4258
### Anything you want to discuss about vllm. This document includes the features in vLLM's roadmap for Q3 2024. Please feel free to discuss and contribute, as this roadmap is...
### Anything you want to discuss about vllm. This is a meta RFC tracking some of the performance enhancement works we are prioritizing. - [ ] https://github.com/vllm-project/vllm/issues/6797 - [ ]...
### Anything you want to discuss about vllm. - [x] #8390 - [x] #8375 - [x] #8399 - [ ] #8417 - [x] #8415 - [x] #8376 - [x] #8425
This issues describes the high level directions that "create LLM Engine V2". We want the design to be as transparent as possible and created this issue to track progress and...
This page is accessible via [roadmap.vllm.ai](https://roadmap.vllm.ai) ### Themes. As before, we categorized our roadmap into 6 broad themes: broad model support, wide hardware coverage, state of the art performance optimization,...
Running the server (using the vLLM CLI or our [docker image](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html)): * `vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16` * `vllm serve meta-llama/Llama-3.2-90B-Vision-Instruct --enforce-eager --max-num-seqs 32 --tensor-parallel-size 8` Currently: * Only...
Please leave comments here about your usage of V1, does it work? does it not work? which feature do you need in order to adopt it? any bugs? For bug...