Bowen Wang
Bowen Wang
On macOS with Python 3.11: ```python $ docker ps | choose -1 /opt/homebrew/Cellar/choose/0.1.0_4/libexec/bin/choose:138: SyntaxWarning: "is" with a literal. Did you mean "=="? if len(names) is 1: CONTAINER ID IMAGE COMMAND...
Thank you for crafting this benchmark. When trying to run evaluation, I got errors about failure on creating Conda environments with low Python versions, like 3.5 or 3.6. This seems...
FlashInfer [0.2.3](https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.2.3) introduced some breaking changes to its sampler API, this PR updates the calling sites in vLLM to adapt to the update. FIX #14815 FIX #15666
This PR introduces support for **dynamic load balancing** in **expert parallelism (EP)** for the deployment of **Mixture-of-Experts (MoE)** models. Dynamic load balancing is essential for auxiliary-loss-free MoE models, such as...