boltz icon indicating copy to clipboard operation
boltz copied to clipboard

Results with custom MSA are different and misfolded compared to the result from --use_msa_server

Open kartik-rallapalli opened this issue 1 year ago • 5 comments

I am trying to fold a protein+RNA structure and while the predict fold is correct in case of the server generated MSAs (screenshot with green ribbons)( Screenshot 2025-01-08 at 12 02 00 PM ), when I try to predict the same complex with a custom MSA the RNA fold seems distorted(orange in the screenshot Screenshot 2025-01-08 at 12 05 06 PM ). Intriguingly, the custom MSA used in this case is the output generated from the --use-msa-server. Has anyone encountered/resolved such a issue yet? Thank you!

kartik-rallapalli avatar Jan 08 '25 20:01 kartik-rallapalli

Sorry I am not sure how to resolve this issue but I wonder what MSA file you used from the case where --use-msa-server. Did you use bfd*.a3m or uniref.a3m in the output msa folder? Thanks a lot!

weitse-hsu avatar Jan 23 '25 02:01 weitse-hsu

Thanks for taking a look at this issue. I used the uniref.a3m file! I believe that there is something in the way last line of any custom msa input file is processed (line 4459 in the screenshot below).

Image

Along these lines, are the custom MSA required to have the uniref fasta IDs? Do these MSA homologs help decide the pdb templates like in Alphafold?

Thanks again for creating this repo and sharing the amazing work!

kartik-rallapalli avatar Jan 23 '25 17:01 kartik-rallapalli

Hi @kartik-rallapalli , I checked the source code a bit and it looks like when --use_msa_server is on, the preprocessed MSA in the CSV file generated by compute_msa is used for downstream processing. It looks to me that replacing uniref.a3m with the CSV file in your FASTA file should probably address the issue. Or have you already tried this yet?

weitse-hsu avatar Feb 06 '25 23:02 weitse-hsu

Also, it seems that for structures with relatively low confidence (0.8 I would say as I rarely see a score much lower than that), the predictions could vary a lot between replicates even if you use exactly the same MSA information.

weitse-hsu avatar Feb 09 '25 00:02 weitse-hsu

Hi @wehs7661 , I tried to use the csv file instead of the a3m file format and it does work smoothly and my prediction comes to be within 2A of the RMSD of the experimental pdb file. Thanks for the csv file tip, this solves my prolem!

kartik-rallapalli avatar Feb 10 '25 20:02 kartik-rallapalli