dilated-attention-pytorch
dilated-attention-pytorch copied to clipboard
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
i got this During the Benchmark ```python # assert len(unknown_axes) == 1, 'this is enforced when recipe is created, so commented out' --> 186 if isinstance(length, int) and isinstance(known_product, int)...
Hi @fkodom, I really like your implementation and I wanted to use dilated attention into a vanilla transformer model to try how things work. Right now, I am facing a...
Hi! First of all, thanks for your great implementation. I think it is very awesome, I like it a lot. I was wondering if you have also implemented a backward...
Hello Frank! I love what you have created, and am having a great time going through and parsing through your implementation of the paper. It appears you have nailed the...