nano-vllm icon indicating copy to clipboard operation
nano-vllm copied to clipboard

Nano vLLM

Results 43 nano-vllm issues
Sort by recently updated
recently updated
newest added

This patch is the 1st of the patch list where we show how nano-vllm performs on AMD platform.

HI @GeeeekExplorer Thanks for your great work on nano-vllm. I just tried on both AMD CDNA dataceter GPU and AMD RDNA3/4 desktop GPU, and it can work on both of...

Fix: Correct off-by-one error in KV-Cache block allocation This pull request addresses a critical off-by-one error in the BlockManager's logic for allocating new KV-Cache blocks during the decoding phase. The...

PR Description What does this PR do? This PR introduces full support for the Qwen2 large language model (LLM) in the project

In #71 #66 #65 #30 , there were questions about the timing of applying `can_append` and `may_append` for requesting new blocks. This PR will separate the logic for appending new...

Hi nano-vllm team, this is Yue, nice to meet you, I'm just a fan of this repo and learning it! As I learn the code of block_manager, IIUC the current...

The `can_append` function in the `BlockManager` returns a boolean that indicates whether we can store a sampled token for the given sequence. Currently, the code snippet `len(seq) % self.block_size ==...

This PR introduces a new benchmark script, `serving_bench.py`, to evaluate the engine's performance under a continuous load of incoming requests, simulating a real-world serving scenario. **Note:** This PR is purely...