Adamo Young comments

Repositories
Issues
Comments

Results 3 comments of


                                            Adamo Young

trafficstars

Memory Efficiency w.r.t Sequence Length

My guess is there is no difference, based on how the masks are used in the [Attention class](https://github.com/lucidrains/x-transformers/blob/c1283da7f4d87ecfe583f305f7c988097987766c/x_transformers/x_transformers.py#L384)

Memory Efficiency w.r.t Sequence Length

Thanks! Do you know if other implementations tend to do this as well? In `pytorch_geometric` they allow for graph batching where the memory usage scales with the number of nodes/edges...

Memory Efficiency w.r.t Sequence Length

Thanks, that's a good solution! Will check it out.