boltz Results with custom MSA are different and misfolded compared to the result from --use_msa

I am trying to fold a protein+RNA structure and while the predict fold is correct in case of the server generated MSAs (screenshot with green ribbons)( Screenshot 2025-01-08 at 12 02 00 PM ), when I try to predict the same complex with a custom MSA the RNA fold seems distorted(orange in the screenshot Screenshot 2025-01-08 at 12 05 06 PM ). Intriguingly, the custom MSA used in this case is the output generated from the --use-msa-server. Has anyone encountered/resolved such a issue yet? Thank you!

Jan 08 '25 20:01 kartik-rallapalli

Sorry I am not sure how to resolve this issue but I wonder what MSA file you used from the case where --use-msa-server. Did you use bfd*.a3m or uniref.a3m in the output msa folder? Thanks a lot!

Jan 23 '25 02:01 weitse-hsu

Thanks for taking a look at this issue. I used the uniref.a3m file! I believe that there is something in the way last line of any custom msa input file is processed (line 4459 in the screenshot below).

Along these lines, are the custom MSA required to have the uniref fasta IDs? Do these MSA homologs help decide the pdb templates like in Alphafold?

Thanks again for creating this repo and sharing the amazing work!

Jan 23 '25 17:01 kartik-rallapalli

Hi @kartik-rallapalli , I checked the source code a bit and it looks like when --use_msa_server is on, the preprocessed MSA in the CSV file generated by compute_msa is used for downstream processing. It looks to me that replacing uniref.a3m with the CSV file in your FASTA file should probably address the issue. Or have you already tried this yet?

Feb 06 '25 23:02 weitse-hsu

Also, it seems that for structures with relatively low confidence (0.8 I would say as I rarely see a score much lower than that), the predictions could vary a lot between replicates even if you use exactly the same MSA information.

Feb 09 '25 00:02 weitse-hsu

Hi @wehs7661 , I tried to use the csv file instead of the a3m file format and it does work smoothly and my prediction comes to be within 2A of the RMSD of the experimental pdb file. Thanks for the csv file tip, this solves my prolem!

Feb 10 '25 20:02 kartik-rallapalli

Results with custom MSA are different and misfolded compared to the result from --use_msa_server