proteinfold
proteinfold copied to clipboard
Pipeline Fails at compare_structures During align_structures Step in Multimer Mode
Description of the bug
When running the proteinfold pipeline in multimer mode, the execution fails at the compare_structures process, specifically during the align_structures step. Issue seems to be that additional chains by ESMFold, do not reset residue positions.
solution: add function to generate_comparison_report.py to resolve misalignment of pdb structures.
Command used and terminal output
nextflow run main.nf -profile test_full_alphafold2_multimer,test_full_colabfold_local,singularity --mode alphafold2,esmfold,colabfold --esmfold_db /path/to/db --esmfold_model_preset multimer --colabfold_server local --colabfold_db /path/to/db/colabfold_dbs --colabfold_model_preset alphafold2_multimer_v2 --use_templates false --use_gpu --input sequence_multimer.csv --outdir /path/to/outdir
ERROR:
Starting...
generating html report...
Traceback (most recent call last):
File "/scratch/er01/mb0076/proteinfold/generate_rerror/generate_comparison_report.py", line 214, in <module>
aligned_structures = align_structures(structures)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/er01/mb0076/proteinfold/generate_rerror/generate_comparison_report.py", line 133, in align_structures
super_imposer.set_atoms(ref_atoms, target_atoms)
File "/opt/conda/lib/python3.12/site-packages/Bio/PDB/Superimposer.py", line 35, in set_atoms
raise PDBException("Fixed and moving atom lists differ in size")
Bio.PDB.PDBExceptions.PDBException: Fixed and moving atom lists differ in size
Relevant files
No response
System information
No response
This is because esmfold does not natively support a multimer mode and instead approximates multimers by predicting in monomer mode with a large gap in residue indices between chains. You can see the implementation here. Based on the defaults, residue indices between chains will always be offset by 512 + 25.