nano-vllm issues

This patch is the 1st of the patch list where we show how nano-vllm performs on AMD platform.

Update README.md to add AMD GPU instructions

1

HI @GeeeekExplorer Thanks for your great work on nano-vllm. I just tried on both AMD CDNA dataceter GPU and AMD RDNA3/4 desktop GPU, and it can work on both of...

zhangnju

fix: update can_append and may_append logic for block allocation

5

Fix: Correct off-by-one error in KV-Cache block allocation This pull request addresses a critical off-by-one error in the BlockManager's logic for allocating new KV-Cache blocks during the decoding phase. The...

skyloevil

add Qwen2 model support

PR Description What does this PR do? This PR introduces full support for the Qwen2 large language model (LLM) in the project

Zlzzzupup

Optimize block management in decode phase

2

In #71 #66 #65 #30 , there were questions about the timing of applying `can_append` and `may_append` for requesting new blocks. This PR will separate the logic for appending new...

xiaohajiayou

Fix bug in block manager's may_append

1

Hi nano-vllm team, this is Yue, nice to meet you, I'm just a fan of this repo and learning it! As I learn the code of block_manager, IIUC the current...

yue-zhang-2025

Fix: can_append function returns incorrect result

The `can_append` function in the `BlockManager` returns a boolean that indicates whether we can store a sampled token for the given sequence. Currently, the code snippet `len(seq) % self.block_size ==...

YjyJeff

Add Serving Benchmark Script

2

This PR introduces a new benchmark script, `serving_bench.py`, to evaluate the engine's performance under a continuous load of incoming requests, simulating a real-world serving scenario. **Note:** This PR is purely...

tiannuo-yang

nano-vllm
nano-vllm copied to clipboard

Metadata

增加了MiniCPM4的支持，以及新注册模型的功能

future: add qwen2 and llama support

[ROCm] add amd gpu guide and performance

Update README.md to add AMD GPU instructions

fix: update can_append and may_append logic for block allocation

add Qwen2 model support

Optimize block management in decode phase

Fix bug in block manager's may_append

Fix: can_append function returns incorrect result

Add Serving Benchmark Script

← Metadata

Owner

Metadata

nano-vllm nano-vllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

nano-vllm
nano-vllm copied to clipboard