nano-vllm
nano-vllm copied to clipboard
Nano vLLM
hello while Installing via pip Inside venv i got this error: ERROR: Could not find a version that satisfies the requirement triton>=3.0.0 (from nano-vllm) (from versions: none) ERROR: No matching...
The flash-attn is not available for my GTX1650 though I compile it for hours in windows WSL, so rewrite this class with pure pytorch. most of code are written by...
Thanks for the great work. Im trying to use nano-vllm for LFM2 350M parameters model. But ended up with error. Does nano-vllm support LFM2 models ?
If layer RowParallelLayer has bias, its weight_loader can crash. This is because in weight_loader it uses tp_dim to get shard_size, but bias is a 1d tensor and tp_dim = 1,...
Hello @GeeeekExplorer Hope you are doing well For robust testing and maintenance, I think it would be great if there are full test scripts for all components and logic So,...
I’m trying to run nano-vLLM on a DGX Spark box (Ubuntu 24.04.3 LTS, Python 3.12.3, NVIDIA GB10 GPU, CUDA driver 580.95.05, CUDA 13.0, compute capability 12.1). PyTorch is 2.9.0+cu130 with...
## Summary - add the Qwen3-VL multimodal model and loader entry so nano-vllm can run vision-language workloads - extend engine components (placeholder expansion, vision-cache slicing, KV guard) to mirror vLLM’s...
### Description I have added detailed Chinese comments to the entire codebase to facilitate a better understanding of the underlying mechanisms. **Purpose:** To serve as a comprehensive guide for learners...