Ali Hassani
Ali Hassani
I don't know how useful this might be, but in some cases attempts to run DINOv2 on CPU would fail because the xformers requirement could be satisfied, but the model...
Hello, I'm unsure if this repository is still maintained, but I had to resolve this on my own so I figured I'd open a PR as well in case it...
Hello, I'm unsure if this repository is still maintained, but I had to resolve this on my own so I figured I'd open a PR as well in case it...
NATTEN ops break graphs because they are not registered as native torch ops. The correct way to do this since pytorch 2.0 is to register C++ ops through torch library...
Adds "Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level" to publications. TLDR; Neighborhood attention requires treating the attention problem as two batched GETTs instead...
Adds torch.compile support for torch>=2.4.0 by refactoring the ops/functions python interface and optionally registering libnatten ops with torch.library instead of creating autograd ops. More information on why all of these...
Backward pass for attention merging needs to be handled manually. dQs from different KV branches should just be elementwise added together. See https://github.com/Dao-AILab/flash-attention/issues/1137
**DO NOT MERGE** Ampere FNA only for now. Allows optionally clipping dot products according to some floating point range (min, max). Issue: #249 TODO: - [ ] Ensure bwd correctness...