Jamie DeAntonis
Jamie DeAntonis
I added an arg so that you can specify the number of segments the index vectors will be quantized to
whoops, sorry about that. just changed
We observed in both. I heard from [here](https://github.com/lucidrains/performer-pytorch/issues/64#issuecomment-819568003) that the reason is caching? Are you still planning to implement it?
Hi all, I'm happy to join this conversation if there's anything to be done. Is this too inefficient? ```python import numpy as np q0 = np.array([[1, 2, 3], [4, 5,...
>And I'm not sure the dimension of q0 (2, 3) mean (L, D)? A sequence with two tokens and each token has 3 dimensions? Yes, that's what I meant. >If...
Does this mean `CausalDotProduct` in `fast-attention` is what we want?
isn't this only solvable by implementing the for-loop directly in the lower-level language? I imagine this is effectively what fast attention does
Can we talk over a call? I just emailed you
I was just reading the fast attention code, and I think it does exactly what we want. Typing is really the only reason the c++ code is torch-specific. Otherwise, all...
I don't think I'm the guy to do this (I don't use c++ or tensorflow), but I think this is a pretty easy problem for someone who at least knows...
@ice-americano (who I work with) ran if and it seemed to work to some degree. Compared to regular attention, he was getting significant improvements in memory usage, but a noticeable...