nano-vllm icon indicating copy to clipboard operation
nano-vllm copied to clipboard

attention without flash-attn for windows with GTX1650

Open tiehexue opened this issue 1 month ago • 2 comments

The flash-attn is not available for my GTX1650 though I compile it for hours in windows WSL, so rewrite this class with pure pytorch. most of code are written by LLM, and line by line reviewed to fix bugs. It works now. There is also a store_kvcache_pytorch without triton, so the whole should run without cuda.

tiehexue avatar Nov 05 '25 06:11 tiehexue

@tiehexue Hey, can you help me remove Triton dependencies? I want to learn the code, but can't run it directly(because of triton dependencies)

gbdjxgp avatar Nov 10 '25 15:11 gbdjxgp

@tiehexue Hey, can you help me remove Triton dependencies? I want to learn the code, but can't run it directly(because of triton dependencies)

just remove triton dependency and use store_kvcache_pytorch.

tiehexue avatar Nov 13 '25 13:11 tiehexue