esm
esm copied to clipboard
Questions about Stage 1 Training in ESM3 Structure Tokenizer
Hi,
Thank you for open-sourcing ESM3—it’s an excellent resource! I’m currently working on training a VQ-VAE model inspired by the structure tokenizer in ESM3 and have some questions about the Stage 1 training detail.
According to the ESM3 paper, Stage 1 uses five losses:
-
Backbone Distance Loss -
Backbone Direction Loss -
Binned Direction Classification Loss -
Binned Distance Classification Loss -
Inverse Folding Loss
Here are my questions:
- Why is there no
commitment lossin Stage 1? Is there a specific reason it was excluded? - How are the loss weights of these five losses handled? Are the scaled equally, or do they have different weightings?
- In
Binned Direction Classification Loss, the paper mentions an output dot product shape are[L, L, 6]. Could you clarify the meaning of the dimension6in this context?