Horace He

Results 242 comments of Horace He

In general, I think for data structures that can be augmented, like segtrees or (as we recently learned) Link Cut Trees, we should try to separate the code that's meant...

That implementation does add an extra `log n` factor though. Could be worth it to replace it regardless, depending on benchmark results.

@PotatoHashing would you still like to finish this PR? I want to get a LCT implementation - perhaps ideally this one. Alternately, I could take over for this PR if...

> I skipped this before because it didn't make a big difference in my tests (maybe I got the heuristic wrong, or I my test cases were bad, I don't...

Has there been any progress on this front? If not, I would appreciate a pointer as to what needs to be changed so I can poke around for a bit.

Would be sick!

I'm still somewhat confused. Is it not possible to implement 2D Natten with a 1D FMHA + attention mask? To clarify, this diagram is showing a 2d Natten right? I...

> I actually haven't given this much thought recently, but my guess is that it won't be possible. So, to clarify more explicitly, I plotted out how the attention mask...

Thanks for the clarifications! I understand why FNA2d is implemented the way it is, but I'd nevertheless be curious about the performance compared to a local-attention. In particular, I think...

> Could you clarify this? If we're modifying 1-D attention kernel to do 2-D NA with attention masking, it would still have to do extra indexing computation and checks (and...