protein_generator icon indicating copy to clipboard operation
protein_generator copied to clipboard

AA compositional bias

Open AlexWindels opened this issue 1 year ago • 1 comments

Hi all,

I am currently exploring protein generator on the HuggingFace space. I am trying out the AA compositional bias conditioning and I ran the following example: 'W0.2,E0.1', with 40 diffusion steps and a protein length of 250 residues. This resulted in the following protein sequence:

AAPPPAAAVAAAAAAAPPAPAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPAAAAAAAAAAAAAAAAAAAAAAAPAAAALAAAAPAPAAAAAAAPAAAVAAAAAAAAAAAAAAAAAAAAAAAPAAAPAAAAAAAAAAAAAVAAAAAAAAAAAAPAAVPAAAAAAAAAAAAAAAAAAAAAPAAAAAAAAAAAPAAAAPAAAAAAAAAAAAPAAAAAAAAAALAAAAAAAAAVA

As you can see, the sequence is almost exclusively composed out of alanines and no tryptophans or glutamic acids occur, although explicitly conditioned on these residues. When I change residues and/or bias, the results are similar and I never obtain a sequence coming close to the conditions.

Can you verify something is going wrong here?

Best,

Alex

AlexWindels avatar Dec 12 '23 10:12 AlexWindels

Hey Alex,

I would try with a smaller number of amino acids (100 aa) or more steps (100 steps), the network often can struggle at larger lengths to generate cohesive sequence and structure pairs, if theres a more specific application you are going for here let me know and I am happy to discuss more!

0merle0 avatar Dec 24 '23 02:12 0merle0