alphafold
alphafold copied to clipboard
Can't find template .cif due to possibly not following a succession of obsolete PDB IDs/codes, missing the PDB ID/code that actually exists
I ran a query with a fasta sequence and got an error due to a .cif that was not found (note that line numbers may not match the most current commit):
Traceback (most recent call last):
File "/app/alphafold/run_alphafold.py", line 427, in <module>
app.run(main)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/app/alphafold/run_alphafold.py", line 412, in main
is_prokaryote=is_prokaryote)
File "/app/alphafold/run_alphafold.py", line 164, in predict_structure
msa_output_dir=msa_output_dir)
File "/app/alphafold/alphafold/data/pipeline.py", line 212, in process
hits=pdb_template_hits)
File "/app/alphafold/alphafold/data/templates.py", line 901, in get_templates
kalign_binary_path=self._kalign_binary_path)
File "/app/alphafold/alphafold/data/templates.py", line 737, in _process_single_hit
cif_string = _read_file(cif_path)
File "/app/alphafold/alphafold/data/templates.py", line 681, in _read_file
with open(path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/share/singularity/images/Alphafold/Data/2.1.0/pdb_mmcif/mmcif_files/7byz.cif'
Looking at the obsolete.dat, I see these (I'm only showing the relevant rows):
OBSLTE 13-MAY-20 6L2V 7BYZ
OBSLTE 22-JUL-20 7BYZ 7CH2
OBSLTE 25-NOV-20 7CH2 7DE5
So the chain of obsolete entries goes like this: 6L2V -> 7BYZ -> 7CH2 -> 7DE5, where 7DE5 is the final, non obsolete successor. I checked in the local filesystem to find that 7de5.cif exists.
I think the program has a bug in that it's not implemented to go through that chain of obsolete entries to get to 7DE5.cif. It only goes through the chain one sequence ahead, which in this case is 6L2V -> 7BYZ. It couldn't find 7byz.cif, and the program errors out.
In data/templates.py, I see what might be the problem. I see that it's getting the pdb ID/code but I don't see a "search" that follows from the first obsolete entry to the final successor. Below, the program gets the PDB code by getting a dictionary value based on the parsed obsolete pdbs file. If my assumption is right, it's only jumping one sequence. So in my case, it goes from 6L2V to 7BYZ, and stops looking after that.
if hit_pdb_code in obsolete_pdbs:
hit_pdb_code = obsolete_pdbs[hit_pdb_code]
My workaround was to create a custom obsolete pdbs file and having this entry so that the hit is skipped entirely, and the job ran to completion. This is the edit:
I changed the line:
OBSLTE 13-MAY-20 6L2V 7BYZ
into this line:
OBSLTE 13-MAY-20 6L2V
Hey, I ran into a similar problem. But with 6sng.cif and 5cvx.cif. Two different sequences. Unfortunately, I don't exactly understand what you mean by creating a custom obsolete.dat.
@andrejberg I copied the obsolete.dat that came with the installation into a new obsolete.dat file in my working directory. This is due to the fact that the original file has read-only permission. I edited the new copy/file so that it forces the program to not even bother looking for a .cif that replaces the obsolete structure.
@sh999 thank you very much!