memorizing-transformers-pytorch
memorizing-transformers-pytorch copied to clipboard
is it a t5 arch or decoder only gpt style arch?
T5 is also a decoder-only architecture. The paper uses a decoder-only transformer which this memorizing transformer also seems to be!