alphafold icon indicating copy to clipboard operation
alphafold copied to clipboard

issues with uniprot.fasta file integrity when returning jackhmmer results

Open avilella opened this issue 2 years ago • 1 comments

Similar to bug #465 , I also get issues like the one below:

I0516 08:33:11.303210 140061298210624 jackhmmer.py:133] Launching subprocess "/home/user/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpx9qnk8ys/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -
N 1 /tmp/tmpxejxmx66.fasta /data/quick_share/alphafold2/db_v2.2.0/uniprot/uniprot.fasta"                                                                                                                                                                    
I0516 08:33:11.339200 140061298210624 utils.py:36] Started Jackhmmer (uniprot.fasta) query                                                                                                                                                                  
I0516 08:35:12.610642 140061298210624 utils.py:40] Finished Jackhmmer (uniprot.fasta) query in 121.271 seconds                                                                                                                                              
Traceback (most recent call last):                                                                                                                                                                                                                          
  File "/home/user/alphafold-2.2.0/run_alphafold.py", line 422, in <module>                                                                                                                                                                             
    app.run(main)                                                                                                                                                                                                                                           
  File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run                                                                                                                                                 
    _run_main(main, args)                                                                                                                                                                                                                                   
  File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main                                                                                                                                           
    sys.exit(main(argv))                                                                                                      
  File "/home/user/alphafold-2.2.0/run_alphafold.py", line 398, in main                                                                                                                                                                                 
    predict_structure(                                                                                                        
  File "/home/user/alphafold-2.2.0/run_alphafold.py", line 172, in predict_structure                                      
    feature_dict = data_pipeline.process(                                                                                                                                                                                                                   
  File "/home/user/alphafold-2.2.0/alphafold/data/pipeline_multimer.py", line 264, in process                                                                                                                                                           
    chain_features = self._process_single_chain(                                                                                                                                                                                                            
  File "/home/user/alphafold-2.2.0/alphafold/data/pipeline_multimer.py", line 219, in _process_single_chain                                                                                                                                             
    all_seq_msa_features = self._all_seq_msa_features(chain_fasta_path,                                                                                                                                                                                     
  File "/home/user/alphafold-2.2.0/alphafold/data/pipeline_multimer.py", line 227, in _all_seq_msa_features                                                                                                                                             
    result = pipeline.run_msa_tool(                                                                                                                                                                                                                         
  File "/home/user/alphafold-2.2.0/alphafold/data/pipeline.py", line 96, in run_msa_tool                                                                                                                                                                
    result = msa_runner.query(input_fasta_path)[0]                                                                                                                                                                                                          
  File "/home/user/alphafold-2.2.0/alphafold/data/tools/jackhmmer.py", line 171, in query                                                                                                                                                               
    single_chunk_result = self._query_chunk(                                                                                                                                                                                                                
  File "/home/user/alphafold-2.2.0/alphafold/data/tools/jackhmmer.py", line 142, in _query_chunk                                                                                                                                                        
    raise RuntimeError(                                                                                                                                                                                                                                     
RuntimeError: Jackhmmer failed                                                                                                                                                                                                                              
stderr:                                                                                                                       
                                                                                                                              
Error: Parse failed (sequence file /data/quick_share/alphafold2/db_v2.2.0/uniprot/uniprot.fasta):                                                                                                                                                           
Line 484402440: illegal character 

I tried tinkering with the code a bit and trying to ignore hits that contain these issues, but at the moment I can't find a clean way to do it (this is done as multiprocessing Pool calls that then turn into an array, the elements of which can't be None).

avilella avatar May 17 '22 09:05 avilella

@avilella check line 484402440, probably you have '*' character in the sequence fasta file.

shahryary avatar May 18 '22 06:05 shahryary