Frank Odom
Frank Odom
@mohamedelbahnasawi Are you trying to build an encoder-only model (e.g. BERT)? I don't have an immediate solution for the padding masks in that case, but I can look into it...
@mohamedelbahnasawi Got it -- I'll take a look. I believe [the fix should be here](https://github.com/fkodom/dilated-attention-pytorch/blob/87cda7579874b6485ea81a742b6a0dc51ffad6cc/dilated_attention_pytorch/dilated_attention.py#L94-L100). `xops` also allows you to pass a `Tensor` mask, in place of the `LowerTriangularMask` I...
@Coluding Not sure I understand. Could you elaborate a bit? The forward pass is implemented here, and the backward pass can be done automatically with PyTorch. Are you thinking there...
@Coluding Yes, the backward pass works and scales roughly the same as `forward` (linear with sequence length). Can test that with a slightly modified `benchmark.py` script: ``` INFO:root:Benchmark dilated attention......
@Akbarable Letting you know I see this issue. 👀 Having a very busy week, so it may be a few days before I can dedicate time to this. Will update...
Sorry for the delay here.. I don't have a Windows computer, so I'm not able to reproduce on my end. Is the error message truncated above? It seems like there...
@Rivian01 Sorry to hear that. Unfortunately, I don't have a Windows machine, so it's difficult for me to test/debug this. Are you familiar with Python? The `solve-semantle` CLI command is...