nano-vllm icon indicating copy to clipboard operation
nano-vllm copied to clipboard

Nano vLLM

Results 43 nano-vllm issues
Sort by recently updated
recently updated
newest added

```python prompts = [ "Hello" * 248, ] * 513 ``` I ran `example.py` with the above prompts, and it crashed. ``` [rank0]: torch.AcceleratorError: CUDA error: invalid configuration argument [rank0]:...

Hi GeeeekExplorer, I’ve been exploring nano-vLLM and really appreciate the lightweight design and clean Python API. I noticed that currently the project primarily uses `pyproject.toml`/UV for dependency management. I’d like...

I get the following problems. Generating: 88%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 44/50 [02:56

Hi, nano vLLM with engine v1 is here! Since v0.6.0 release of vLLM, there is a brand new engine backend with multiprocessing, as described in [the official blog](https://blog.vllm.ai/2025/01/27/v1-alpha-release.html). I have...

Didn't expect that such a nano-demo would require an Nvidia GPU > 20 series and Linux (WSL) for Triton. Didn't read far /deep (pyproject.toml) enough, so I bumped into the...

Great work on this minimal implementation! * I'm interested in extending the nano-vllm framework to support Qwen2.5vl. * Since Qwen3 has already been implemented, most of the necessary building blocks...

Hi, For the sake of improved serving throughput - utilization and not compromising TTFT, is there the support for chunked-prefilling i.e, the engine step supporting the forward pass with chunked...

hey bros, nice jobs! I am wondering if you need we individuals join you to develop some new feats into nano vllm? if that, could you pls list some urgent...

Today I tested your work and found that it has achieved approximately a 10% performance improvement compared to vllm on the A100. This is an elegant and outstanding job. I...

def can_append(self, seq: Sequence) -> bool: return len(self.free_block_ids) >= (len(seq) % self.block_size == 1) this is a error?