recurrent-memory-transformer-pytorch icon indicating copy to clipboard operation
recurrent-memory-transformer-pytorch copied to clipboard

Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch

Results 8 recurrent-memory-transformer-pytorch issues
Sort by recently updated
recently updated
newest added

RMT-R or Recurrent Memory Transformer - Retrieval is a new paper from the lab that describes a methodology to inject Past Mt-1 Memories into a Retrieval Cross attention head to...

During the first run, `mems == None`, and the model doesn't attend to any "read" tokens, as per: https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/3be7d43604c6921a7dbdc68f88c7f3c534f82d2a/recurrent_memory_transformer_pytorch/recurrent_memory_transformer.py#L350-L355 Why not attend to ```read_memory_emb```, and replace with ``` read_mem_length =...

``` net = RecurrentMemoryTransformer( seq_len=1024, num_tokens=256, num_memory_tokens=128, dim=512, depth=1, causal=True, heads=4, dim_head=128, use_flash_attn=True, rotary_pos_emb=True ).eval() x = torch.randint(0, 256, (8, 1024)) jit = torch.jit.trace(net, (x,)) x = torch.randint(0, 256, (8,...

@lucidrains Do you have any advice on how to adapt `RecurrentMemoryTransformerWrapper` such that it works with CTC ?

What is a good number for `seq_len` ? What are the trade-offs for shorter or longer `seq_len`? Like, why can't `seq_len==1` ? Infinite recurrence is infinite recurrence no matter what...

it looks like from https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/98bf3091a29fbd65dbbb30ce00dd1cadd05fef2d/recurrent_memory_transformer_pytorch/attend.py#L62-L67 and https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/98bf3091a29fbd65dbbb30ce00dd1cadd05fef2d/recurrent_memory_transformer_pytorch/attend.py#L93-L99 we manually configure `F.scaled_dot_product_attention()`. From the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) it says "All implementations are enabled by default. Scaled dot product attention attempts to automatically select...

ive got a few days of full access to a cluster of about 8a6000s and im itching to put them to some insane task, i hadnt even considered this but...

I was looking at the rotational position embedding code path (https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/35cd18deeb7965491873fcba4a15d581106eae39/recurrent_memory_transformer_pytorch/recurrent_memory_transformer.py#L414) and noticed this comment: # rotary embedding - offset main positions by 10000, and keep all memories at position...