proteinfold icon indicating copy to clipboard operation
proteinfold copied to clipboard

Pipeline Fails at compare_structures During align_structures Step in Multimer Mode

Open Mitchob opened this issue 9 months ago • 1 comments

Description of the bug

When running the proteinfold pipeline in multimer mode, the execution fails at the compare_structures process, specifically during the align_structures step. Issue seems to be that additional chains by ESMFold, do not reset residue positions.

solution: add function to generate_comparison_report.py to resolve misalignment of pdb structures.

Command used and terminal output

nextflow run main.nf -profile test_full_alphafold2_multimer,test_full_colabfold_local,singularity --mode alphafold2,esmfold,colabfold --esmfold_db /path/to/db --esmfold_model_preset multimer --colabfold_server local --colabfold_db /path/to/db/colabfold_dbs --colabfold_model_preset alphafold2_multimer_v2 --use_templates false --use_gpu --input sequence_multimer.csv --outdir /path/to/outdir

ERROR:
Starting...
generating html report...
Traceback (most recent call last):
  File "/scratch/er01/mb0076/proteinfold/generate_rerror/generate_comparison_report.py", line 214, in <module>
    aligned_structures = align_structures(structures)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/er01/mb0076/proteinfold/generate_rerror/generate_comparison_report.py", line 133, in align_structures
    super_imposer.set_atoms(ref_atoms, target_atoms)
  File "/opt/conda/lib/python3.12/site-packages/Bio/PDB/Superimposer.py", line 35, in set_atoms
    raise PDBException("Fixed and moving atom lists differ in size")
Bio.PDB.PDBExceptions.PDBException: Fixed and moving atom lists differ in size

Relevant files

No response

System information

No response

Mitchob avatar Mar 24 '25 02:03 Mitchob

This is because esmfold does not natively support a multimer mode and instead approximates multimers by predicting in monomer mode with a large gap in residue indices between chains. You can see the implementation here. Based on the defaults, residue indices between chains will always be offset by 512 + 25.

tlitfin-unsw avatar Mar 25 '25 03:03 tlitfin-unsw