openfold icon indicating copy to clipboard operation
openfold copied to clipboard

Alignment taking too long

Open calmasri opened this issue 2 years ago • 4 comments

I was trying to generate new alignments using the precompute_alignments_mmseqs.py script:

python3 scripts/precompute_alignments_mmseqs.py  /fasta_dir/query_seqs.fasta \
    data/mmseqs_dbs \
    uniref30_2103_db \
    /fasta_dir  \
    /data/MMseqs2/build/bin/mmseqs \
        --hhsearch_binary_path /usr/bin/hhsearch \
    --env_db colabfold_envdb_202108_db \
    --pdb70 data/pdb70/pdb70

Where query_seqs.fasta was generated from scripts/data_dir_to_fasta.py and contains almost all the structures in data/pdb_mmcif/mmcif_files (minus ~500-1000 structures).

I'm running on a machine with the following specs: 4 GPUs - Tesla V100 GPU Memory: 64 (GB) Cpus: 32 Memory: 244 GB

The script has been running for about 5 days now, I'm not sure if it's normal. How long should it normally take, and would I need more than 3TB storage space allocated for the output?

calmasri avatar Aug 17 '22 10:08 calmasri

Are you regenerating PDB alignments? There's no need to do that; we've pre-computed them all. See the RODA repository linked in the README.

gahdritz avatar Aug 18 '22 05:08 gahdritz

same issue here. use the same script as you outlined in the readme. tge query seuqence is just a regular 237 aa sequence ython3 scripts/precompute_alignments_mmseqs.py /fasta_dir/query_seqs.fasta
data/mmseqs_dbs
uniref30_2103_db
/fasta_dir
/data/MMseqs2/build/bin/mmseqs
--hhsearch_binary_path /usr/bin/hhsearch
--env_db colabfold_envdb_202108_db
--pdb70 data/pdb70/pdb70

lzhangUT avatar Aug 18 '22 07:08 lzhangUT

Hi, @gahdritz, how much space does this data need?

player1321 avatar Dec 22 '22 09:12 player1321

I think the entire thing is around 2TB, but you can download subsets of it.

gahdritz avatar Jan 29 '23 07:01 gahdritz