memorizing-transformers-pytorch issues

Dimensionality of key and values for Attention

8

I have two questions about the key and value calculation in Attention (and similarly for KNNAttention). The relevant line is: https://github.com/lucidrains/memorizing-transformers-pytorch/blob/83fa1479d6f7881dd977fbff55681e709e3b250e/memorizing_transformers_pytorch/memorizing_transformers_pytorch.py#L135 1. Why is there only one Linear layer `to_kv`,...

manestay

Maybe scale is wrong

3

https://github.com/lucidrains/memorizing-transformers-pytorch/blob/83fa1479d6f7881dd977fbff55681e709e3b250e/memorizing_transformers_pytorch/memorizing_transformers_pytorch.py#L237 Shouldn't this be (1-scale)?

denadai2

Any interesting results?

79

Hey! Cool repo. I like all the knn+lm methods Did you do some runs yet? Anything interesting to report?

rom1504

brando90

memorizing-transformers-pytorch
memorizing-transformers-pytorch copied to clipboard

Metadata

Dimensionality of key and values for Attention

Maybe scale is wrong

Any interesting results?

FAISS hard reset

index out of

Support for Multi-GPU training?

is there any environment detial for use this package ?

is it a t5 arch or decoder only gpt style arch?

hugging face training code with demo

official repo?

← Metadata

Owner

Metadata

memorizing-transformers-pytorch memorizing-transformers-pytorch copied to clipboard

Metadata

← Metadata

Owner

Metadata

memorizing-transformers-pytorch
memorizing-transformers-pytorch copied to clipboard