af2complex
af2complex copied to clipboard
Error when generating features with `feature_mode='multimer'`
I want to generate features for a protein complex with a modified version of the example/run_fea_gen.sh
script:
#!/bin/bash
# An example script of feature generation. This heavily depenedent on your installation,
# due to many third-party tools and multiple sequence libraries.
#
# You need to take care of these paths, python environment, and third-party sequence tools.
#. load_alphafold ## set up proper AlphaFold conda environment.
DATA_DIR=/ibex/ai/reference/KSL/alphafold/2.3.1
af_dir=../src
if [ $# -eq 0 ]
then
echo "Usage: $0 <seq_file>"
exit 1
fi
fasta_path=$1
out_dir=af2c_fea_test
# choices are "reduced_dbs", "full_dbs", "uniprot"
db_preset='full_dbs'
# choices are "monomer, multimer, monomer+species, monomer+fullpdb"
# Option "monomer" and "multimer" follows alphafold official datapipeline for monomeric and
# multimeric structure predictions, respectively.
#
# Option "monomer+species" is a modified monomeric pipeline such as the species information
# is recorded for MSA pairing using only monomeric input features. This option is recommended.
#feature_mode='monomer+species'
#
# Option "monomer+fullpdb": in addition to add species, it uses template pipeline for multimer
# rather the template pipeline for the original monomer modeling. The mulitmer template pipeline
# search full PDB for templates, which is more comprehensive than the monomer template pipeline.
# feature_mode='monomer+fullpdb'
feature_mode='multimer'
#max_template_date=2020-05-15 # CASP14 starting date
max_template_date=$(date +"%Y-%m-%d") # current date
echo "Info: sequence file is $fasta_path"
echo "Info: out_dir is $out_dir"
echo "Info: db_preset is $db_preset"
echo "Info: feature mode is $feature_mode"
echo "Info: max_template_date is $max_template_date"
##########################################################################################
python $af_dir/run_af2c_fea.py --fasta_paths=$fasta_path --db_preset=$db_preset \
--data_dir=$DATA_DIR --output_dir=$out_dir \
--uniprot_database_path=$DATA_DIR/uniprot/uniprot.fasta \
--uniref90_database_path=$DATA_DIR/uniref90/uniref90.fasta \
--mgnify_database_path=$DATA_DIR/mgnify/mgy_clusters_2022_05.fa \
--pdb_seqres_database_path=$DATA_DIR/pdb_seqres/pdb_seqres.txt \
--bfd_database_path=$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path=$DATA_DIR/uniref30/UniRef30_2022_02 \
--template_mmcif_dir=$DATA_DIR/pdb_mmcif/mmcif_files \
--max_template_date=$max_template_date \
--obsolete_pdbs_path=$DATA_DIR/pdb_mmcif/obsolete.dat \
--feature_mode=$feature_mode \
--use_precomputed_msas=True
When running the script I obtain the following error:
$ ./run_fea_gen_mod.sh Q9S3U9-6.fasta
Info: sequence file is Q9S3U9-6.fasta
Info: out_dir is af2c_fea_test
Info: db_preset is full_dbs
Info: feature mode is multimer
Info: max_template_date is 2023-03-25
add_species is False
I0325 16:32:42.717077 47109242920640 templates.py:857] Using precomputed obsolete pdbs /ibex/ai/reference/KSL/alphafold/2.3.1/pdb_mmcif/obsolete.dat.
I0325 16:32:42.721372 47109242920640 run_af2c_fea.py:282] Using random seed 372986757380479995 for the data pipeline
Info: working on target Q9S3U9-6 at gpu202-23-l
I0325 16:32:42.721538 47109242920640 run_af2c_fea.py:144] Predicting Q9S3U9-6
I0325 16:32:42.726290 47109242920640 pipeline_multimer.py:287] Running monomer pipeline on chain A: sp|Q9S3U9|VIOC_CHRVO
I0325 16:32:42.726786 47109242920640 jackhmmer.py:133] Launching subprocess "/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/bin/jackhmmer -o /dev/null -A /tmp/tmp5q6it5mi/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmpvq51lmhm.fasta /ibex/ai/reference/KSL/alphafold/2.3.1/uniref90/uniref90.fasta"
I0325 16:32:42.730009 47109242920640 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0325 16:37:28.661425 47109242920640 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 285.931 seconds
I0325 16:37:28.665437 47109242920640 jackhmmer.py:133] Launching subprocess "/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/bin/jackhmmer -o /dev/null -A /tmp/tmpbc32gpxf/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmpvq51lmhm.fasta /ibex/ai/reference/KSL/alphafold/2.3.1/mgnify/mgy_clusters_2022_05.fa"
I0325 16:37:28.670499 47109242920640 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query
I0325 16:47:29.123045 47109242920640 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 600.452 seconds
I0325 16:47:29.134068 47109242920640 hmmbuild.py:121] Launching subprocess ['/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/bin/hmmbuild', '--hand', '--amino', '/tmp/tmpe_2th29r/output.hmm', '/tmp/tmpe_2th29r/query.msa']
I0325 16:47:29.147607 47109242920640 utils.py:36] Started hmmbuild query
I0325 16:47:29.319181 47109242920640 hmmbuild.py:128] hmmbuild stdout:
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.3.2 (Nov 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file: /tmp/tmpe_2th29r/query.msa
# output HMM file: /tmp/tmpe_2th29r/output.hmm
# input alignment is asserted as: protein
# model architecture construction: hand-specified by RF annotation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# idx name nseq alen mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1 query 505 156 120 3.48 0.590
# CPU time: 0.15u 0.00s 00:00:00.15 Elapsed: 00:00:00.15
stderr:
I0325 16:47:29.319365 47109242920640 utils.py:40] Finished hmmbuild query in 0.172 seconds
I0325 16:47:29.319745 47109242920640 hmmsearch.py:103] Launching sub-process ['/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/bin/hmmsearch', '--noali', '--cpu', '8', '--F1', '0.1', '--F2', '0.1', '--F3', '0.1', '--incE', '100', '-E', '100', '--domE', '100', '--incdomE', '100', '-A', '/tmp/tmpzilc_m4o/output.sto', '/tmp/tmpzilc_m4o/query.hmm', '/ibex/ai/reference/KSL/alphafold/2.3.1/pdb_seqres/pdb_seqres.txt']
I0325 16:47:29.331137 47109242920640 utils.py:36] Started hmmsearch (pdb_seqres.txt) query
I0325 16:47:38.230762 47109242920640 utils.py:40] Finished hmmsearch (pdb_seqres.txt) query in 8.899 seconds
Traceback (most recent call last):
File "../src/run_af2c_fea.py", line 309, in <module>
app.run(main)
File "/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/ibex/sw/csg/alphafold/2.3.1/el7.9_conda/miniconda3/envs/alphafold_2.3.1/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "../src/run_af2c_fea.py", line 289, in main
predict_structure(
File "../src/run_af2c_fea.py", line 155, in predict_structure
feature_dict = data_pipeline.process(
File "/ibex/user/guzmanfj/af2complex/src/alphafold/data/pipeline_multimer.py", line 341, in process
chain_features = self._process_single_chain(
File "/ibex/user/guzmanfj/af2complex/src/alphafold/data/pipeline_multimer.py", line 289, in _process_single_chain
chain_features = self._monomer_data_pipeline.process(
File "/ibex/user/guzmanfj/af2complex/src/alphafold/data/pipeline.py", line 238, in process
msa_runner=self.hhblits_bfd_uniref_runner,
AttributeError: 'DataPipeline' object has no attribute 'hhblits_bfd_uniref_runner'
These are the contents of the Q9S3U9-6.fasta
input file:
>sp|Q9S3U9|VIOC_CHRVO
MKRAIIVGGGLAGGLTAIYLAKRGYEVHVVEKRGDPLRDLSSYVDVVSSRAIGVSMTVRG
IKSVLAAGIPRAELDACGEPIVAMAFSVGGQYRMRELKPLEDFRPLSLNRAAFQKLLNKY
>sp|Q9S3U9|VIOC_CHRVO
MKRAIIVGGGLAGGLTAIYLAKRGYEVHVVEKRGDPLRDLSSYVDVVSSRAIGVSMTVRG
IKSVLAAGIPRAELDACGEPIVAMAFSVGGQYRMRELKPLEDFRPLSLNRAAFQKLLNKY
Thank you for reporting this bug. It was caused by renaming of a variable that affects MSA search on the UniProt ref30 library. I pushed in a fix. Please give it a try.
It seems to work now, it produced the features.pkl
file. Thank you for your help!