memorizing-transformers-pytorch
memorizing-transformers-pytorch copied to clipboard
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
I have two questions about the key and value calculation in Attention (and similarly for KNNAttention). The relevant line is: https://github.com/lucidrains/memorizing-transformers-pytorch/blob/83fa1479d6f7881dd977fbff55681e709e3b250e/memorizing_transformers_pytorch/memorizing_transformers_pytorch.py#L135 1. Why is there only one Linear layer `to_kv`,...
https://github.com/lucidrains/memorizing-transformers-pytorch/blob/83fa1479d6f7881dd977fbff55681e709e3b250e/memorizing_transformers_pytorch/memorizing_transformers_pytorch.py#L237 Shouldn't this be (1-scale)?
Hey! Cool repo. I like all the knn+lm methods Did you do some runs yet? Anything interesting to report?
Hello and thanks for this implementation! Do you know of any solutions to efficiently solve the "hard reset" problem in FAISS? I know that one could use IndexFlatL2 but that's...
when I run train.py, error like this ,"index out of range: Tried to access index 10218 out of table with 255 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418"happens
Thank you so much for the great implementation. I would like to ask whether your implementation for Memorizing Transformer could support multi-card distributed training like original paper. If you distribute...
current environment: - faiss 1.7.1 - faiss-cpu 1.7.4 - joblib 1.3.1 - numpy 1.25.1 - pip 23.1.2 - setuptools 67.8.0 - wheel 0.38.4 I don't install pytorch yet, because not...
curious/puzzled. Would google really release their model in pytorch? Is this the official implementation of Memorizing Transformers? (btw, great work!)