openfold icon indicating copy to clipboard operation
openfold copied to clipboard

Soloseq inference - can't fold using ESM-1b alone

Open amardeepranu opened this issue 5 months ago • 6 comments

In the README it states that template finding for SoloSeq will be skipped if no tools or dbs are passed, and the fold will happen using ESM alone. However I get multiple errors when running the following command:

python openfold/run_pretrained_openfold.py \
    meth_fastas \
    openfold/data/pdb_mmcif/mmcif_files \
    --output_dir results \
    --model_device "cuda:0" \
    --config_preset "seq_model_esm1b_ptm" \
    --openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt
  1. If the mmcif_files are not downloaded it fails if they aren't downloaded, I had to download them to get past this.
  2. Removing all args related to tools and db like above, I still get a HHSearch error:
INFO:/root/openfold/openfold/openfold/utils/script_utils.py:Loaded OpenFold parameters at openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt...
INFO:/root/openfold/openfold/run_pretrained_openfold.py:Generating alignments for MGYG000000044_01310...
Traceback (most recent call last):
  File "/root/openfold/openfold/run_pretrained_openfold.py", line 470, in <module>
    main(args)
  File "/root/openfold/openfold/run_pretrained_openfold.py", line 275, in main
    precompute_alignments(tags, seqs, alignment_dir, args)
  File "/root/openfold/openfold/run_pretrained_openfold.py", line 80, in precompute_alignments
    template_searcher = hhsearch.HHSearch(
  File "/root/openfold/openfold/openfold/data/tools/hhsearch.py", line 58, in __init__
    if not glob.glob(database_path + "_*"):
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

It still seems to be attempting to generate alignments. Is this an error? Do I need to specify another flag to skip every tool + alignment and just use ESM based folding?

Thank you.

amardeepranu avatar Feb 21 '24 21:02 amardeepranu

Hi @amardeepranu , thanks for your interest in Soloseq.

Could you try generating the embeddings first using: python scripts/precompute_embeddings.py fasta_dir/ embeddings_output_dir/

And then using the same run_pretrained_openfold.py command, but with --use_precomputed_alignments=embeddings_output_dir

jnwei avatar Feb 22 '24 10:02 jnwei

@jnwei thanks, that worked but now I'm getting:

FileNotFoundError: [Errno 2] No such file or directory: 'openfold/resources/params/params_model_1.npz'
bash: line 6: --output_dir: command not found

Seems like it requires --jax_param_path? Is this required to run the folding?

amardeepranu avatar Feb 23 '24 13:02 amardeepranu

There are a separate set of weights used for soloseq, which were defined in your earlier command by this argument: --openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt

Judging by the bash: line 6: --output_dir: command not found message, perhaps there's a whitespace/ newline character issue in the command?

jnwei avatar Feb 23 '24 15:02 jnwei

@jnwei this is my full command:

python openfold/run_pretrained_openfold.py \
    fastas \
    --use_precomputed_alignments embeddings/meth \
    --output_dir results \
    --model_device "cuda:0" \
    --config_preset "seq_model_esm1b_ptm" \
    --openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt

with this I get an error demanding template_mmcif_dir to be included. Are templates required when running ESM-only folding?

amardeepranu avatar Feb 23 '24 16:02 amardeepranu

You will need to provide a directory for the --template_mmcif_dir. Despite the required flag, templates are not necessary for folding predictions.

If your precomputed alignments directory does not contain any alignments for templates (e.g. it only contains the pre-computed ESM embeddings), then template structures will not be used for creating the prediction.

In the future, we may refactor the inference script so that the soloseq mode does not require a template_mmcif_dir if the template-based prediction path is not used to help avoid this confusion.

jnwei avatar Feb 27 '24 06:02 jnwei

Note that the directory cannot be empty so if you want to use no templates at all, at the moment you still need to at least fake one with something like touch empty.cif.

vaclavhanzl avatar Feb 29 '24 10:02 vaclavhanzl