se3-transformer-pytorch
se3-transformer-pytorch copied to clipboard
Whether SE3 needs pre-training
Thank you for your work. I used your reproduced SE3 as a part of my model, but the current test effect is not very good. I guess it may be because I do not have a good understanding of your model. Here are my questions:
- Does your model need pre-training?
- Can I train SE3 Transformer with the full connection layer that comes after it? Good advice is also welcome
- I've found that pre-training helps (100 batches, linear weight scale from 1e-6 up to 1e-4). I've also found that smaller depth (2 or 3) works better than larger depth (>3).
- I'm not sure what you mean here. The fully connected layer that acts on type-1 features (i.e. 3d-coordinates) in the attention block? Or the linear projection that projects the final output form the dx3 to 1x3 (i.e. projection from the hidden dimension to output dimension).
Either way, both of these are equivariant operations, so you can train with or without them. I recommend keeping them as-is.