nano-vllm attention without flash-attn for windows with GTX1650

attention without flash-attn for windows with GTX1650

Open tiehexue opened this issue 1 month ago • 2 comments

The flash-attn is not available for my GTX1650 though I compile it for hours in windows WSL, so rewrite this class with pure pytorch. most of code are written by LLM, and line by line reviewed to fix bugs. It works now. There is also a store_kvcache_pytorch without triton, so the whole should run without cuda.

Nov 05 '25 06:11 tiehexue

@tiehexue Hey, can you help me remove Triton dependencies? I want to learn the code, but can't run it directly(because of triton dependencies)

Nov 10 '25 15:11 gbdjxgp

@tiehexue Hey, can you help me remove Triton dependencies? I want to learn the code, but can't run it directly(because of triton dependencies)

just remove triton dependency and use store_kvcache_pytorch.

Nov 13 '25 13:11 tiehexue

nano-vllm nano-vllm copied to clipboard

attention without flash-attn for windows with GTX1650

nano-vllm
nano-vllm copied to clipboard