alphafold icon indicating copy to clipboard operation
alphafold copied to clipboard

Can't find template .cif due to possibly not following a succession of obsolete PDB IDs/codes, missing the PDB ID/code that actually exists

Open sh999 opened this issue 1 year ago • 3 comments

I ran a query with a fasta sequence and got an error due to a .cif that was not found (note that line numbers may not match the most current commit):

Traceback (most recent call last):
  File "/app/alphafold/run_alphafold.py", line 427, in <module>
    app.run(main)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/app/alphafold/run_alphafold.py", line 412, in main
    is_prokaryote=is_prokaryote)
  File "/app/alphafold/run_alphafold.py", line 164, in predict_structure
    msa_output_dir=msa_output_dir)
  File "/app/alphafold/alphafold/data/pipeline.py", line 212, in process
    hits=pdb_template_hits)
  File "/app/alphafold/alphafold/data/templates.py", line 901, in get_templates
    kalign_binary_path=self._kalign_binary_path)
  File "/app/alphafold/alphafold/data/templates.py", line 737, in _process_single_hit
    cif_string = _read_file(cif_path)
  File "/app/alphafold/alphafold/data/templates.py", line 681, in _read_file
    with open(path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/share/singularity/images/Alphafold/Data/2.1.0/pdb_mmcif/mmcif_files/7byz.cif'

Looking at the obsolete.dat, I see these (I'm only showing the relevant rows):

OBSLTE    13-MAY-20 6L2V     7BYZ
OBSLTE    22-JUL-20 7BYZ     7CH2
OBSLTE    25-NOV-20 7CH2     7DE5

So the chain of obsolete entries goes like this: 6L2V -> 7BYZ -> 7CH2 -> 7DE5, where 7DE5 is the final, non obsolete successor. I checked in the local filesystem to find that 7de5.cif exists.

I think the program has a bug in that it's not implemented to go through that chain of obsolete entries to get to 7DE5.cif. It only goes through the chain one sequence ahead, which in this case is 6L2V -> 7BYZ. It couldn't find 7byz.cif, and the program errors out.

In data/templates.py, I see what might be the problem. I see that it's getting the pdb ID/code but I don't see a "search" that follows from the first obsolete entry to the final successor. Below, the program gets the PDB code by getting a dictionary value based on the parsed obsolete pdbs file. If my assumption is right, it's only jumping one sequence. So in my case, it goes from 6L2V to 7BYZ, and stops looking after that.

    if hit_pdb_code in obsolete_pdbs:
      hit_pdb_code = obsolete_pdbs[hit_pdb_code]

My workaround was to create a custom obsolete pdbs file and having this entry so that the hit is skipped entirely, and the job ran to completion. This is the edit:

I changed the line:

OBSLTE    13-MAY-20 6L2V     7BYZ

into this line:

OBSLTE    13-MAY-20 6L2V   

sh999 avatar May 26 '23 23:05 sh999

Hey, I ran into a similar problem. But with 6sng.cif and 5cvx.cif. Two different sequences. Unfortunately, I don't exactly understand what you mean by creating a custom obsolete.dat.

andrejberg avatar Jul 10 '23 15:07 andrejberg

@andrejberg I copied the obsolete.dat that came with the installation into a new obsolete.dat file in my working directory. This is due to the fact that the original file has read-only permission. I edited the new copy/file so that it forces the program to not even bother looking for a .cif that replaces the obsolete structure.

sh999 avatar Jul 15 '23 05:07 sh999

@sh999 thank you very much!

andrejberg avatar Jul 25 '23 11:07 andrejberg