recurrent-memory-transformer-pytorch issues

Implement RMT-R (New Paper feature to RMTs)

RMT-R or Recurrent Memory Transformer - Retrieval is a new paper from the lab that describes a methodology to inject Past Mt-1 Memories into a Retrieval Cross attention head to...

anoojpatel

Question: first read memories

12

During the first run, `mems == None`, and the model doesn't attend to any "read" tokens, as per: https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/3be7d43604c6921a7dbdc68f88c7f3c534f82d2a/recurrent_memory_transformer_pytorch/recurrent_memory_transformer.py#L350-L355 Why not attend to ```read_memory_emb```, and replace with ``` read_mem_length =...

pfeatherstone

Feature request: make JIT and ONNX export work

4

``` net = RecurrentMemoryTransformer( seq_len=1024, num_tokens=256, num_memory_tokens=128, dim=512, depth=1, causal=True, heads=4, dim_head=128, use_flash_attn=True, rotary_pos_emb=True ).eval() x = torch.randint(0, 256, (8, 1024)) jit = torch.jit.trace(net, (x,)) x = torch.randint(0, 256, (8,...

pfeatherstone

Question: how to adapt this for CTC loss

2

@lucidrains Do you have any advice on how to adapt `RecurrentMemoryTransformerWrapper` such that it works with CTC ?

pfeatherstone

Question: How to set seq_len ?

1

What is a good number for `seq_len` ? What are the trade-offs for shorter or longer `seq_len`? Like, why can't `seq_len==1` ? Infinite recurrence is infinite recurrence no matter what...

pfeatherstone

Question: configuring scaled_dot_product_attention

it looks like from https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/98bf3091a29fbd65dbbb30ce00dd1cadd05fef2d/recurrent_memory_transformer_pytorch/attend.py#L62-L67 and https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/98bf3091a29fbd65dbbb30ce00dd1cadd05fef2d/recurrent_memory_transformer_pytorch/attend.py#L93-L99 we manually configure `F.scaled_dot_product_attention()`. From the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) it says "All implementations are enabled by default. Scaled dot product attention attempts to automatically select...

pfeatherstone

have you had a chance to train it yet?

2

ive got a few days of full access to a cluster of about 8a6000s and im itching to put them to some insane task, i hadnt even considered this but...

Alignment-Lab-AI

What is the purpose of positional offset in the rotary positional embedding implementation?

I was looking at the rotational position embedding code path (https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/35cd18deeb7965491873fcba4a15d581106eae39/recurrent_memory_transformer_pytorch/recurrent_memory_transformer.py#L414) and noticed this comment: # rotary embedding - offset main positions by 10000, and keep all memories at position...

ifed-ucsd

recurrent-memory-transformer-pytorch
recurrent-memory-transformer-pytorch copied to clipboard

Metadata

Implement RMT-R (New Paper feature to RMTs)

Question: first read memories

Feature request: make JIT and ONNX export work

Question: how to adapt this for CTC loss

Question: How to set seq_len ?

Question: configuring scaled_dot_product_attention

have you had a chance to train it yet?

What is the purpose of positional offset in the rotary positional embedding implementation?

← Metadata

Owner

Metadata

recurrent-memory-transformer-pytorch recurrent-memory-transformer-pytorch copied to clipboard

Metadata

← Metadata

Owner

Metadata

recurrent-memory-transformer-pytorch
recurrent-memory-transformer-pytorch copied to clipboard