pytorch_geometric Local multi-headed self-attention

🚀 The feature, motivation and pitch

I am unable to find the clean implementation of local multi-headed self-attention in pytorch geometric. I found three types of multi-head attention, one TransformerConv (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.TransformerConv.html#torch_geometric.nn.conv.TransformerConv). But this one calculates a linear combination of all features with different attention weights as opposed to dividing features into multiple heads and taking their linear combination: another RGATConv in the similar direction (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.RGATConv.html). And finally GPSConv (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.GPSConv.html) that does multi-head attention but is global.

Alternatives

I think it is nice to have the implementation of local self-attention with multiple heads where each head looks into a part of the feature dimension.

Additional context

No response

Feb 26 '24 18:02 ck-amrahd

Hey @rusty1s Can I work on this issue?

Mar 13 '24 15:03 Dsantra92

Feel free take this if you want :)

Mar 14 '24 15:03 rusty1s

TransformerConv may be a good starting point. I think the main changes that need to be done is in the softmax calculation for each head.

Mar 16 '24 03:03 ck-amrahd