openfold
openfold copied to clipboard
Failing to run the inference script
No such file or directory: 'fasta_dir'
I have created my conda environment and downloaded all the data required in the readme file. However, I couldn't run the inference file successfully. The compiler told me that I miss a file/dir called 'fasta_dir'. Is that anything I miss that can generate this file or dir? Thanks a lot!
Detail
I follow the readme and enter
python3 run_pretrained_openfold.py \ [3:40:21]
fasta_dir \
data/pdb_mmcif/mmcif_files/ \
--uniref90_database_path data/uniref90/uniref90.fasta \
--mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path data/pdb70/pdb70 \
--uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--output_dir ./ \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--model_device "cuda:0" \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign
--config_preset "model_1_ptm"
--openfold_checkpoint_path openfold/resources/openfold_params/finetuning_2_ptm.pt
However, I got this error.
File "run_pretrained_openfold.py", line 499, in <module>
main(args)
File "run_pretrained_openfold.py", line 332, in main
for fasta_file in list_files_with_extensions(args.fasta_dir, (".fasta", ".fa")):
File "run_pretrained_openfold.py", line 299, in list_files_with_extensions
return [f for f in os.listdir(dir) if f.endswith(extensions)]
FileNotFoundError: [Errno 2] No such file or directory: 'fasta_dir'
zsh: command not found: --config_preset
zsh: command not found: --openfold_checkpoint_path
Those are just symbolic names. You need to change fasta_dir etc. to the names of actual directories containing the corresponding files. fasta_dir should be a directory containing .fasta files whose structures you want to predict, and so on.
@gahdritz Thanks! I notice that this project provide script that generate fasta file from mmcif file. I wonder where the following command will generate the right thing for running inference program.
python data_dir_to_fasta.py --data_dir data/pdb_mmcif/mmcif_files --output_path fasta_dir
That script consolidates a bunch of .mmcif files into one single .fasta file---it's not suitable for this, since the inference script interprets multi-sequence .fasta files as complexes. If you have a bunch of .mmcif files you want to run inference on, you should split up the sequences into individual .fasta files and then place them in a single directory. This directory is what you would pass to the inference script as the data_dir parameter.