recurrent-memory-transformer-pytorch Question: configuring scaled_dot_product

Question: configuring scaled_dot_product_attention

Open pfeatherstone opened this issue 1 year ago • 0 comments

it looks like from https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/98bf3091a29fbd65dbbb30ce00dd1cadd05fef2d/recurrent_memory_transformer_pytorch/attend.py#L62-L67 and https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/98bf3091a29fbd65dbbb30ce00dd1cadd05fef2d/recurrent_memory_transformer_pytorch/attend.py#L93-L99 we manually configure F.scaled_dot_product_attention(). From the documentation it says "All implementations are enabled by default. Scaled dot product attention attempts to automatically select the most optimal implementation based on the inputs." Can't we just let pytorch decide?

Aug 08 '23 09:08 pfeatherstone

recurrent-memory-transformer-pytorch recurrent-memory-transformer-pytorch copied to clipboard

Question: configuring scaled_dot_product_attention

recurrent-memory-transformer-pytorch
recurrent-memory-transformer-pytorch copied to clipboard