esm icon indicating copy to clipboard operation
esm copied to clipboard

Embedding Multichain proteins with ESM3 and ESMC

Open Aurelien-Pelissier opened this issue 9 months ago • 1 comments

Hi,

Many proteins naturally exist in pairs, often referred to as alpha and beta chains, such as TCRs, antibodies, and MHCs. I’m curious about how ESM3/ESMC processes these cases.

  • Does ESM3/ESMC have a special separator or handling mechanism for paired chains? I've seen the '|' separator in the sequence tokenizer.
  • When using ESMProtein.from_pdb(pdbID), how does ESM handle multi-chain proteins?

Would appreciate any insights on this!

Thanks!

Aurelien-Pelissier avatar Mar 23 '25 23:03 Aurelien-Pelissier

I am also curious about whether some examples (for instance, 3_gfp_design.ipynb) could possibly include multimers, since I found there is "esm3-medium-multimer-2024-09" available, and dealing with multi-chain proteins are of much greater interest for researchers. Thanks!

SimonYYS avatar Sep 15 '25 05:09 SimonYYS