openfold
openfold copied to clipboard
Alignment taking too long
I was trying to generate new alignments using the precompute_alignments_mmseqs.py
script:
python3 scripts/precompute_alignments_mmseqs.py /fasta_dir/query_seqs.fasta \
data/mmseqs_dbs \
uniref30_2103_db \
/fasta_dir \
/data/MMseqs2/build/bin/mmseqs \
--hhsearch_binary_path /usr/bin/hhsearch \
--env_db colabfold_envdb_202108_db \
--pdb70 data/pdb70/pdb70
Where query_seqs.fasta was generated from scripts/data_dir_to_fasta.py
and contains almost all the structures in data/pdb_mmcif/mmcif_files (minus ~500-1000 structures).
I'm running on a machine with the following specs: 4 GPUs - Tesla V100 GPU Memory: 64 (GB) Cpus: 32 Memory: 244 GB
The script has been running for about 5 days now, I'm not sure if it's normal. How long should it normally take, and would I need more than 3TB storage space allocated for the output?
Are you regenerating PDB alignments? There's no need to do that; we've pre-computed them all. See the RODA repository linked in the README.
same issue here.
use the same script as you outlined in the readme. tge query seuqence is just a regular 237 aa sequence
ython3 scripts/precompute_alignments_mmseqs.py /fasta_dir/query_seqs.fasta
data/mmseqs_dbs
uniref30_2103_db
/fasta_dir
/data/MMseqs2/build/bin/mmseqs
--hhsearch_binary_path /usr/bin/hhsearch
--env_db colabfold_envdb_202108_db
--pdb70 data/pdb70/pdb70
Hi, @gahdritz, how much space does this data need?
I think the entire thing is around 2TB, but you can download subsets of it.