nano-vllm
nano-vllm copied to clipboard
attention without flash-attn for windows with GTX1650
The flash-attn is not available for my GTX1650 though I compile it for hours in windows WSL, so rewrite this class with pure pytorch. most of code are written by LLM, and line by line reviewed to fix bugs. It works now. There is also a store_kvcache_pytorch without triton, so the whole should run without cuda.
@tiehexue Hey, can you help me remove Triton dependencies? I want to learn the code, but can't run it directly(because of triton dependencies)
@tiehexue Hey, can you help me remove Triton dependencies? I want to learn the code, but can't run it directly(because of triton dependencies)
just remove triton dependency and use store_kvcache_pytorch.