ColabFold on running colabfold locally

Hi there,

We have plans on running a few hundreds of complexes (some relatively big) and we've been trying to set up a pipeline to run locally on our HPC (actual local run not remote access to the notebooks).

I've been able to run colabfold_search.sh locally thanks to @milot-mirdita's comment on #70. I used a few complexes as input (just dimers) and the MSA generation ran until completion. One of my questions is the role of merge_and_split_msas.py, which based on the comments in the script is meant to separate the MSA's(?), well, I only got one single a3m file as output named based on the first fasta header of the input (out of 20).

Another question, colabfold_batch still uses the Google Colab notebooks, not local resources, so it's not an option. I was wondering if there is an easy way to 'pickle' the MSA files into the default intermediate msa.pickle that the advanced notebooks used to generate. It is the standard input for localcolabfold so it would makes things easy as we could run localcolabfold for the model generation part once the multimer option is released.

Thank you in advance.

Nov 16 '21 07:11 xvazquezc

Unfortunately, we currently do not have a solution how to store the paired+unpaired MSA for complex predictions. The msa.pickle might be a solution but we need a script that converts the colabfold_search.sh to a pickle. All of this is a bit complicated because complexes MSAs in default are composed for sequences from two searches one for the paired and one for the unpaired part. Anyhow, we discuss currently how we can build this feature.

Nov 17 '21 08:11 martin-steinegger

Related to this, I've been trying to run colabfold_batch from a ready a3m file but it still tries to connect to the https://a3m.mmseqs.com/ server while asking for a local GPU. Maybe I'm missing something, but this seems redundant...

Is the MMSeqs server used for the modelling? If so, why does it ask for a local GPU?
If the MMSeqs server is only used during the MSA generation, why does it try to connect to it if the input is a a3m file?

Nov 29 '21 05:11 xvazquezc

@xvazquezc is should not use the MMseqs2 server if you provide a a3m input. Also our a3m server does not do any folding, it only generates the input MSAs.

Their is also some update on the a3m input for complex prediction (see https://github.com/sokrypton/ColabFold/issues/76). However we still do not have a colabfold_search integration but you could build your own a3ms if you'd like.

Nov 29 '21 06:11 martin-steinegger

@martin-steinegger that's what I thought. I generated the a3m through colabfold_search.sh and then split_msas.py but colabfold_batch still tries to connect. I know this is the case because when I run it as an HPC job it gets stuck (jobs can't access Internet) and the error at the end it's all related to urllib3.. typical URL connection issues. e.g.:

Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/bin/colabfold_batch", line 8, in <module>
    sys.exit(main())
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/colabfold/batch.py", line 794, in main
    download_alphafold_params(is_complex, data_dir)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/colabfold/download.py", line 29, in download_alphafold_params
    response = requests.get(url, stream=True)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect
    conn = self._new_conn()
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)

Nov 29 '21 06:11 xvazquezc

It's not actually trying to connect the msa server, but it fails downloading the weights for alphafold. You'll have to download them yourself and to put them or symlink them to ~/.cache/colabfold/params/ (or whatever you set your cache dir to). Make sure ~/.cache/colabfold/params/download_finished.txt also exists, that's the marker that colabfold is looking for

Nov 29 '21 06:11 konstin

Now that I manually downloaded the weights to avoid issues with the HPC. I tried to continue the processing from the old MSA I had generated with colabfold_search.sh with colabfold_batch, but I kept getting this error (see below). I even reinstalled the whole environment, just in case, and I re-generated the MSA with the newer colabfold_search and colabfold_split_msas, but once again I've got the same kind of error:

$ colabfold_batch --model-type AlphaFold2-multimer msas_Cat2_B2RPK0-P09429/ predictions_Cat2_B2RPK0-P09429/
f115d2822e32f943fab96db219a0f0ca3729b799
2021-11-30 16:46:38,377 Running colabfold 1.2.0 (f115d2822e32f943fab96db219a0f0ca3729b799)
2021-11-30 16:46:38,387 Found 5 citations for tools or databases
2021-11-30 16:46:47,068 Query 1/1: Cat2_B2RPK0-P09429 (length 427)
2021-11-30 16:46:48,250 Running model_3
Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 883, in __getitem__
    field = self._fields[key]
KeyError: 'data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 807, in __getattr__
    return self[attribute]
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 889, in __getitem__
    raise KeyError(self._generate_did_you_mean_message(key, str(e)))
KeyError: "'data'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/bin/colabfold_batch", line 8, in <module>
    sys.exit(main())
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 1228, in main
    zip_results=args.zip,
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 1010, in run
    stop_at_score=stop_at_score,
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 189, in predict_structure
    use_templates,
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 122, in batch_input
    eval_cfg = model_config.data.eval
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 809, in __getattr__
    raise AttributeError(e)
AttributeError: "'data'"

Any idea what it might be? Thanks

Nov 30 '21 05:11 xvazquezc

Just to update, I was trying to run a complex, so this error disappears when adding the first line starting with # as @martin-steinegger indicates in a recent comment in #76. I have a consistent error no matter how I format the a3m though:

$ colabfold_batch --model-type AlphaFold2-multimer --pair-mode unpaired msas_test/unpaired_mod_trim_ext.a3m predictions_test_unpaired/
f115d2822e32f943fab96db219a0f0ca3729b799
2021-12-01 13:08:30,860 Running colabfold 1.2.0 (f115d2822e32f943fab96db219a0f0ca3729b799)
2021-12-01 13:08:30,877 Found 5 citations for tools or databases
2021-12-01 13:08:39,645 Query 1/1: unpaired_mod_trim_ext (length 426)
2021-12-01 13:08:39,669 Could not get MSA/templates for unpaired_mod_trim_ext: list index out of range
Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 951, in run
    ) = unserialize_msa(a3m_lines, query_sequence)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 796, in unserialize_msa
    paired_msa[j] += ">" + header_no_faster_split[j] + "\n"
IndexError: list index out of range

Dec 01 '21 02:12 xvazquezc

colabfold_search as well as colabfold_batch supports batch complex predictions. Just provide a fasta or csv fle with your complex sequences. Following is a example.fasta:

>1
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK:PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK
>2 
PIAQIHILEGRSDEQKE:PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK

You can search the databases, build the MSAs and predict the complex structures using the following commands:

colabfold_search example.fasta db msas
colabfold_batch msas predictions

Please update your local MMseqs2 to the newest version (see MMseqs2 repository).

Feb 27 '22 15:02 martin-steinegger

Hello everyone,

I am also trying to run colabfold_batch on a HPC. I am able to run one sequence. However, when I try running two sequences sequentially I get the following errors:

INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED Could not predict try1. Not Enough GPU memory? INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***

I am requesting the following resources: --gpu=1 --ntasks-per-node= 1 --nodes=1 --mem=128GB

I am not sure if it is a problem with my software or if I am requesting the incorrect resources.

I am not very familiar with HPCs, so any help would be very much appreciated.

Thank you!

May 27 '22 15:05 vmischley

Just to update, I was trying to run a complex, so this error disappears when adding the first line starting with # as @martin-steinegger indicates in a recent comment in #76. I have a consistent error no matter how I format the a3m though:

$ colabfold_batch --model-type AlphaFold2-multimer --pair-mode unpaired msas_test/unpaired_mod_trim_ext.a3m predictions_test_unpaired/
f115d2822e32f943fab96db219a0f0ca3729b799
2021-12-01 13:08:30,860 Running colabfold 1.2.0 (f115d2822e32f943fab96db219a0f0ca3729b799)
2021-12-01 13:08:30,877 Found 5 citations for tools or databases
2021-12-01 13:08:39,645 Query 1/1: unpaired_mod_trim_ext (length 426)
2021-12-01 13:08:39,669 Could not get MSA/templates for unpaired_mod_trim_ext: list index out of range
Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 951, in run
    ) = unserialize_msa(a3m_lines, query_sequence)
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 796, in unserialize_msa
    paired_msa[j] += ">" + header_no_faster_split[j] + "\n"
IndexError: list index out of range

Now that I manually downloaded the weights to avoid issues with the HPC. I tried to continue the processing from the old MSA I had generated with colabfold_search.sh with colabfold_batch, but I kept getting this error (see below). I even reinstalled the whole environment, just in case, and I re-generated the MSA with the newer colabfold_search and colabfold_split_msas, but once again I've got the same kind of error:

$ colabfold_batch --model-type AlphaFold2-multimer msas_Cat2_B2RPK0-P09429/ predictions_Cat2_B2RPK0-P09429/
f115d2822e32f943fab96db219a0f0ca3729b799
2021-11-30 16:46:38,377 Running colabfold 1.2.0 (f115d2822e32f943fab96db219a0f0ca3729b799)
2021-11-30 16:46:38,387 Found 5 citations for tools or databases
2021-11-30 16:46:47,068 Query 1/1: Cat2_B2RPK0-P09429 (length 427)
2021-11-30 16:46:48,250 Running model_3
Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 883, in __getitem__
    field = self._fields[key]
KeyError: 'data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 807, in __getattr__
    return self[attribute]
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 889, in __getitem__
    raise KeyError(self._generate_did_you_mean_message(key, str(e)))
KeyError: "'data'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/561/xc3587/miniconda3/envs/colabfold/bin/colabfold_batch", line 8, in <module>
    sys.exit(main())
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 1228, in main
    zip_results=args.zip,
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 1010, in run
    stop_at_score=stop_at_score,
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 189, in predict_structure
    use_templates,
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 122, in batch_input
    eval_cfg = model_config.data.eval
  File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 809, in __getattr__
    raise AttributeError(e)
AttributeError: "'data'"

Any idea what it might be? Thanks

Did you figure this one out? I tried running a single sequence mand it ran the full job, but now am trying multimer_v2 and got the same error:

Traceback (most recent call last): File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/ml_collections/config_dict/con fig_dict.py", line 903, in getitem field = self._fields[key] KeyError: 'data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/ml_collections/config_dict/con fig_dict.py", line 827, in getattr return self[attribute] File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/ml_collections/config_dict/con fig_dict.py", line 909, in getitem raise KeyError(self._generate_did_you_mean_message(key, str(e))) KeyError: "'data'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/bin/colabfold_batch", line 8, in sys.exit(main()) File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/colabfold/batch.py", line 1752 , in main stop_at_score_below=args.stop_at_score_below, File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/colabfold/batch.py", line 1385 , in run random_seed=random_seed, File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/colabfold/batch.py", line 350, in predict_structure use_templates, File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/colabfold/batch.py", line 279, in batch_input eval_cfg = model_config.data.eval File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/ml_collections/config_dict/con fig_dict.py", line 829, in getattr raise AttributeError(e) AttributeError: "'data'"

Aug 04 '22 16:08 jenchem

ColabFold ColabFold copied to clipboard

on running colabfold locally

ColabFold
ColabFold copied to clipboard