recurrent-memory-transformer-pytorch
recurrent-memory-transformer-pytorch copied to clipboard
Question: configuring scaled_dot_product_attention
it looks like from https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/98bf3091a29fbd65dbbb30ce00dd1cadd05fef2d/recurrent_memory_transformer_pytorch/attend.py#L62-L67 and https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/98bf3091a29fbd65dbbb30ce00dd1cadd05fef2d/recurrent_memory_transformer_pytorch/attend.py#L93-L99 we manually configure F.scaled_dot_product_attention()
.
From the documentation it says "All implementations are enabled by default. Scaled dot product attention attempts to automatically select the most optimal implementation based on the inputs."
Can't we just let pytorch decide?