alphafold
alphafold copied to clipboard
issues with uniprot.fasta file integrity when returning jackhmmer results
Similar to bug #465 , I also get issues like the one below:
I0516 08:33:11.303210 140061298210624 jackhmmer.py:133] Launching subprocess "/home/user/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpx9qnk8ys/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -
N 1 /tmp/tmpxejxmx66.fasta /data/quick_share/alphafold2/db_v2.2.0/uniprot/uniprot.fasta"
I0516 08:33:11.339200 140061298210624 utils.py:36] Started Jackhmmer (uniprot.fasta) query
I0516 08:35:12.610642 140061298210624 utils.py:40] Finished Jackhmmer (uniprot.fasta) query in 121.271 seconds
Traceback (most recent call last):
File "/home/user/alphafold-2.2.0/run_alphafold.py", line 422, in <module>
app.run(main)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/user/alphafold-2.2.0/run_alphafold.py", line 398, in main
predict_structure(
File "/home/user/alphafold-2.2.0/run_alphafold.py", line 172, in predict_structure
feature_dict = data_pipeline.process(
File "/home/user/alphafold-2.2.0/alphafold/data/pipeline_multimer.py", line 264, in process
chain_features = self._process_single_chain(
File "/home/user/alphafold-2.2.0/alphafold/data/pipeline_multimer.py", line 219, in _process_single_chain
all_seq_msa_features = self._all_seq_msa_features(chain_fasta_path,
File "/home/user/alphafold-2.2.0/alphafold/data/pipeline_multimer.py", line 227, in _all_seq_msa_features
result = pipeline.run_msa_tool(
File "/home/user/alphafold-2.2.0/alphafold/data/pipeline.py", line 96, in run_msa_tool
result = msa_runner.query(input_fasta_path)[0]
File "/home/user/alphafold-2.2.0/alphafold/data/tools/jackhmmer.py", line 171, in query
single_chunk_result = self._query_chunk(
File "/home/user/alphafold-2.2.0/alphafold/data/tools/jackhmmer.py", line 142, in _query_chunk
raise RuntimeError(
RuntimeError: Jackhmmer failed
stderr:
Error: Parse failed (sequence file /data/quick_share/alphafold2/db_v2.2.0/uniprot/uniprot.fasta):
Line 484402440: illegal character
I tried tinkering with the code a bit and trying to ignore hits that contain these issues, but at the moment I can't find a clean way to do it (this is done as multiprocessing Pool calls that then turn into an array, the elements of which can't be None
).
@avilella check line 484402440, probably you have '*' character in the sequence fasta file.