openfold icon indicating copy to clipboard operation
openfold copied to clipboard

Failing to run the inference script

Open liuyixin-louis opened this issue 3 years ago • 3 comments
trafficstars

No such file or directory: 'fasta_dir'

I have created my conda environment and downloaded all the data required in the readme file. However, I couldn't run the inference file successfully. The compiler told me that I miss a file/dir called 'fasta_dir'. Is that anything I miss that can generate this file or dir? Thanks a lot!

Detail

I follow the readme and enter

python3 run_pretrained_openfold.py \                                        [3:40:21]
    fasta_dir \
    data/pdb_mmcif/mmcif_files/ \
    --uniref90_database_path data/uniref90/uniref90.fasta \
    --mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path data/pdb70/pdb70 \
    --uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --output_dir ./ \
    --bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --model_device "cuda:0" \
    --jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
    --hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
    --hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
    --kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign
    --config_preset "model_1_ptm"
    --openfold_checkpoint_path openfold/resources/openfold_params/finetuning_2_ptm.pt

However, I got this error.

  File "run_pretrained_openfold.py", line 499, in <module>
    main(args)
  File "run_pretrained_openfold.py", line 332, in main
    for fasta_file in list_files_with_extensions(args.fasta_dir, (".fasta", ".fa")):
  File "run_pretrained_openfold.py", line 299, in list_files_with_extensions
    return [f for f in os.listdir(dir) if f.endswith(extensions)]
FileNotFoundError: [Errno 2] No such file or directory: 'fasta_dir'
zsh: command not found: --config_preset
zsh: command not found: --openfold_checkpoint_path

liuyixin-louis avatar Jul 04 '22 07:07 liuyixin-louis

Those are just symbolic names. You need to change fasta_dir etc. to the names of actual directories containing the corresponding files. fasta_dir should be a directory containing .fasta files whose structures you want to predict, and so on.

gahdritz avatar Jul 04 '22 16:07 gahdritz

@gahdritz Thanks! I notice that this project provide script that generate fasta file from mmcif file. I wonder where the following command will generate the right thing for running inference program.

python data_dir_to_fasta.py --data_dir data/pdb_mmcif/mmcif_files --output_path fasta_dir 

liuyixin-louis avatar Jul 05 '22 14:07 liuyixin-louis

That script consolidates a bunch of .mmcif files into one single .fasta file---it's not suitable for this, since the inference script interprets multi-sequence .fasta files as complexes. If you have a bunch of .mmcif files you want to run inference on, you should split up the sequences into individual .fasta files and then place them in a single directory. This directory is what you would pass to the inference script as the data_dir parameter.

gahdritz avatar Jul 08 '22 20:07 gahdritz