Nystromformer issues

Please add a license to this repo

First, thank you for sharing this project with us! Could you please add an explicit `LICENSE` file to the repo so that it's clear under what terms the content is...

mbrukman

Self-attention weights don't always sum to 1

2

Hi all, Nice work. I have a question however. I see with in my problem setting that the self-attention (SA) weights don't always sum to ~1 (row-wise). I assume this...

DennisHaijma

score of softmax on Text4k; linformer-256 & nystrom-64 doesn't work

1

Hi, Thanks for the excellent work! I found some issues in my humble trials (I didn't change anything in the code): 1. using softmax attention on Text4k I got ~63.7...

ZiweiHe

Retrieval accuracy different from official JAX/FLAX implementation

1

I wonder why the Retrieval accuracy is almost 20% higher than the official JAX/FLAX implementation. As the paper says, "While we achieve consistent results reported in (Tay et al. 2020)...

cwq159

Incorrect initialization of pseudoinverse matrix calculation leads to convergence failure

11

Hi! First, let me thank you all for the amazing paper! It's detailed enough that I've managed to successfully reproduce Nyströmformer in Tensorflow and so far I'm impressed how well...

kpot

Does it support causal mask for GPT2-esque models?

7

For models with only Decoder-stacks, how to apply causal mask?

miguelvictor

Influence of the "conv_kernel_size" within the proposed Nystrom Attention

4

Congrats on your great work! I am verifying your method on vision tasks and have a small concern on the influence of the "conv_kernel_size" of the 2D group-convolution in your...

PkuRainBow

Max length for text task

Hello, In `LRA/datasets/text.py` the `get_tc_datasets` is called with `max_length = 1000`. In the code this is then concatenated with `np.zeros(24)`, to get a length of 1024. But the maximum length...

arneeichholtz

LRA cifar10.py never stop

When trying to run `python3 cifar10.py` to prepare the cifar10 dataset, the script gets into an infinite loop because `input_pipeline.get_cifar10_datasets` returns a dataset with infinite sample (it uses the `repeat`...

liranringel

Nystromformer
Nystromformer copied to clipboard

Metadata

Please add a license to this repo

Self-attention weights don't always sum to 1

score of softmax on Text4k; linformer-256 & nystrom-64 doesn't work

Retrieval accuracy different from official JAX/FLAX implementation

Incorrect initialization of pseudoinverse matrix calculation leads to convergence failure

Does it support causal mask for GPT2-esque models?

Influence of the "conv_kernel_size" within the proposed Nystrom Attention

Max length for text task

LRA cifar10.py never stop

← Metadata

Owner

Metadata

Nystromformer Nystromformer copied to clipboard

Metadata

← Metadata

Owner

Metadata

Nystromformer
Nystromformer copied to clipboard