nano-vllm issues

ERROR while installing with pip

5

hello while Installing via pip Inside venv i got this error: ERROR: Could not find a version that satisfies the requirement triton>=3.0.0 (from nano-vllm) (from versions: none) ERROR: No matching...

Sabakhupenia

attention without flash-attn for windows with GTX1650

2

The flash-attn is not available for my GTX1650 though I compile it for hours in windows WSL, so rewrite this class with pure pytorch. most of code are written by...

tiehexue

LFM2 models are not working

Thanks for the great work. Im trying to use nano-vllm for LFM2 350M parameters model. But ended up with error. Does nano-vllm support LFM2 models ?

gnana70

RowParallelLayer with bias crash

If layer RowParallelLayer has bias, its weight_loader can crash. This is because in weight_loader it uses tp_dim to get shard_size, but bias is a 1d tensor and tp_dim = 1,...

ygch

Regarding the test script

Hello @GeeeekExplorer Hope you are doing well For robust testing and maintenance, I think it would be great if there are full test scripts for all components and logic So,...

PythonSummit

Can't run on dgx spark - flash-attn issues

3

I’m trying to run nano-vLLM on a DGX Spark box (Ubuntu 24.04.3 LTS, Python 3.12.3, NVIDIA GB10 GPU, CUDA driver 580.95.05, CUDA 13.0, compute capability 12.1). PyTorch is 2.9.0+cu130 with...

letsrock85

Add Qwen3-VL multimodal support

1

## Summary - add the Qwen3-VL multimodal model and loader entry so nano-vllm can run vision-language workloads - extend engine components (placeholder expansion, vision-cache slicing, KV guard) to mirror vLLM’s...

86MaxCao

With detailed Chinese comments for easy learning

### Description I have added detailed Chinese comments to the entire codebase to facilitate a better understanding of the underlying mechanisms. **Purpose:** To serve as a comprehensive guide for learners...

lioZ129

Can we integrate lmcache as the kvconnector so we can use nano-vllm test lmcache

maobaolong

nano-vllm
nano-vllm copied to clipboard

Metadata

ERROR while installing with pip

attention without flash-attn for windows with GTX1650

LFM2 models are not working

RowParallelLayer with bias crash

support flashinfer

Regarding the test script

Can't run on dgx spark - flash-attn issues

Add Qwen3-VL multimodal support

With detailed Chinese comments for easy learning

Can we integrate lmcache as the kvconnector so we can use nano-vllm test lmcache

← Metadata

Owner

Metadata

nano-vllm nano-vllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

nano-vllm
nano-vllm copied to clipboard