Questions about Stage 1 Training in ESM3 Structure Tokenizer

Open Buddha7771 opened this issue 1 year ago • 0 comments

Hi,

Thank you for open-sourcing ESM3—it’s an excellent resource! I’m currently working on training a VQ-VAE model inspired by the structure tokenizer in ESM3 and have some questions about the Stage 1 training detail.

According to the ESM3 paper, Stage 1 uses five losses:

Backbone Distance Loss
Backbone Direction Loss
Binned Direction Classification Loss
Binned Distance Classification Loss
Inverse Folding Loss

Here are my questions:

Why is there no commitment loss in Stage 1? Is there a specific reason it was excluded?
How are the loss weights of these five losses handled? Are the scaled equally, or do they have different weightings?
In Binned Direction Classification Loss, the paper mentions an output dot product shape are [L, L, 6]. Could you clarify the meaning of the dimension 6 in this context?

Jan 16 '25 06:01 Buddha7771