esm icon indicating copy to clipboard operation
esm copied to clipboard

Repeated sequence generation of ESM3 with condition

Open wwzll123 opened this issue 1 year ago • 1 comments

I am trying to use ESM3 for sequence design by specifying the coordinates and amino acid composition of key motifs.

However, I found that using the default temperature of 0.7 resulted in repetitive sequence generation on certain proteins.

For example, "MAAAAAAAAAASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAKADAADAEAIAAAAAAGATPVAKAANEALATYTAKAGVIFAQDQGKNAQALPAIQAAHAAFASARYIAAYARGAAAYALAGVLDAAAAAGIAIAAAAAAAAAAAKTAAAGLAAAAAAAAAATAAAAAKAAVAAAAAAAAAATASANAAAMAAAAAAPEDTATAAGIALLPVPGDLAAAAAAAAAAAAAAAAAAAAAAVAAAAAAAAVAAAAAAAAAAVAAAAGAKAAAAAAAAAALAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADQGAEWLSRLDRGANAAAAAAAAAAGAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAAAAAAGAAATALPRGGAKGALLAAEGGASVLATGGGRFAPRIRDLADVAAPANGLKDAGAYEAAGGALKGAAAGAVAAAGAAAAAVAAAAATAGATGFLATANGLAAIGSDLAAVTVAVAAGINAAANSAGAQALNKGEAINAAFSAAGAAAAAQAAATADNAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVAAAAAAA".

The original residue information comes from the chain A of 7qd4. I prompted a small DNA-binding region.

When I raised the temperature factor to 1, the duplication eased, but the plddt and ptm indicators dropped significantly.

This may be a common problem with language models. It would be best if you could devise a more rational decoding strategy.

Thanks for the great study!

wwzll123 avatar Jan 13 '25 03:01 wwzll123

Do you have code for this? We don't see this much repetition at T=0.7, this is a bit surprising. What's the exact conditioning prompt?

ebetica avatar Sep 19 '25 20:09 ebetica