openfold
openfold copied to clipboard
Soloseq inference - can't fold using ESM-1b alone
In the README it states that template finding for SoloSeq will be skipped if no tools or dbs are passed, and the fold will happen using ESM alone. However I get multiple errors when running the following command:
python openfold/run_pretrained_openfold.py \
meth_fastas \
openfold/data/pdb_mmcif/mmcif_files \
--output_dir results \
--model_device "cuda:0" \
--config_preset "seq_model_esm1b_ptm" \
--openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt
- If the mmcif_files are not downloaded it fails if they aren't downloaded, I had to download them to get past this.
- Removing all args related to tools and db like above, I still get a HHSearch error:
INFO:/root/openfold/openfold/openfold/utils/script_utils.py:Loaded OpenFold parameters at openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt...
INFO:/root/openfold/openfold/run_pretrained_openfold.py:Generating alignments for MGYG000000044_01310...
Traceback (most recent call last):
File "/root/openfold/openfold/run_pretrained_openfold.py", line 470, in <module>
main(args)
File "/root/openfold/openfold/run_pretrained_openfold.py", line 275, in main
precompute_alignments(tags, seqs, alignment_dir, args)
File "/root/openfold/openfold/run_pretrained_openfold.py", line 80, in precompute_alignments
template_searcher = hhsearch.HHSearch(
File "/root/openfold/openfold/openfold/data/tools/hhsearch.py", line 58, in __init__
if not glob.glob(database_path + "_*"):
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
It still seems to be attempting to generate alignments. Is this an error? Do I need to specify another flag to skip every tool + alignment and just use ESM based folding?
Thank you.
Hi @amardeepranu , thanks for your interest in Soloseq.
Could you try generating the embeddings first using:
python scripts/precompute_embeddings.py fasta_dir/ embeddings_output_dir/
And then using the same run_pretrained_openfold.py
command, but with --use_precomputed_alignments=embeddings_output_dir
@jnwei thanks, that worked but now I'm getting:
FileNotFoundError: [Errno 2] No such file or directory: 'openfold/resources/params/params_model_1.npz'
bash: line 6: --output_dir: command not found
Seems like it requires --jax_param_path
? Is this required to run the folding?
There are a separate set of weights used for soloseq, which were defined in your earlier command by this argument: --openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt
Judging by the bash: line 6: --output_dir: command not found
message, perhaps there's a whitespace/ newline character issue in the command?
@jnwei this is my full command:
python openfold/run_pretrained_openfold.py \
fastas \
--use_precomputed_alignments embeddings/meth \
--output_dir results \
--model_device "cuda:0" \
--config_preset "seq_model_esm1b_ptm" \
--openfold_checkpoint_path openfold/openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt
with this I get an error demanding template_mmcif_dir
to be included. Are templates required when running ESM-only folding?
You will need to provide a directory for the --template_mmcif_dir
. Despite the required flag, templates are not necessary for folding predictions.
If your precomputed alignments directory does not contain any alignments for templates (e.g. it only contains the pre-computed ESM embeddings), then template structures will not be used for creating the prediction.
In the future, we may refactor the inference script so that the soloseq mode does not require a template_mmcif_dir
if the template-based prediction path is not used to help avoid this confusion.
Note that the directory cannot be empty so if you want to use no templates at all, at the moment you still need to at least fake one with something like touch empty.cif
.