nano-vllm
nano-vllm copied to clipboard
Nano vLLM
**Thank you for this amazing open-source project!** ### Problem Fixes the crash reported in #114 where the system crashes when prompt length exactly equals `kvcache_block_size` (256 tokens). Additionally, optimizes scheduler...
feat: Add support for Qwen2.5vl model --- 🤖 Automated by Berrry Committer Closes #48
Docs: Clarify and requirements in README (Fixes #57) --- 🤖 Automated by Berrry Committer Closes #57
Using 2 GPUs for tensor parallel inference: The size of num_blocks in BlockManager is determined by the remaining memory on GPU0. When allocating blocks, if the remaining memory on GPU0...
## PR Summary This PR adds **Mixture-of-Experts (MoE)** support to **nano-vllm**, addressing the current lack of MoE-related operators. The Qwen3 MoE model (and other MoE models) in Hugging Face Transformers...
## Summary This PR adds comprehensive multi-environment support to Nano-vLLM, enabling easier deployment and reproducibility across different platforms. The changes include pip, conda, and Docker installation methods while maintaining the...
` warning: `VIRTUAL_ENV=/root/nano-vllm/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead Using CPython 3.11.13 Creating virtual environment at:...
Hi there. Since this is my first PR on github—feedback is very welcome. Thanks in advance for your review! This PR fixes a crash that occurs when a sequence is...