ColabFold
ColabFold copied to clipboard
on running colabfold locally
Hi there,
We have plans on running a few hundreds of complexes (some relatively big) and we've been trying to set up a pipeline to run locally on our HPC (actual local run not remote access to the notebooks).
I've been able to run colabfold_search.sh
locally thanks to @milot-mirdita's comment on #70. I used a few complexes as input (just dimers) and the MSA generation ran until completion. One of my questions is the role of merge_and_split_msas.py
, which based on the comments in the script is meant to separate the MSA's(?), well, I only got one single a3m
file as output named based on the first fasta header of the input (out of 20).
Another question, colabfold_batch
still uses the Google Colab notebooks, not local resources, so it's not an option. I was wondering if there is an easy way to 'pickle' the MSA files into the default intermediate msa.pickle
that the advanced notebooks used to generate. It is the standard input for localcolabfold
so it would makes things easy as we could run localcolabfold
for the model generation part once the multimer option is released.
Thank you in advance.
Unfortunately, we currently do not have a solution how to store the paired+unpaired MSA for complex predictions. The msa.pickle
might be a solution but we need a script that converts the colabfold_search.sh
to a pickle.
All of this is a bit complicated because complexes MSAs in default are composed for sequences from two searches one for the paired and one for the unpaired part. Anyhow, we discuss currently how we can build this feature.
Related to this, I've been trying to run colabfold_batch
from a ready a3m
file but it still tries to connect to the https://a3m.mmseqs.com/
server while asking for a local GPU. Maybe I'm missing something, but this seems redundant...
- Is the MMSeqs server used for the modelling? If so, why does it ask for a local GPU?
- If the MMSeqs server is only used during the MSA generation, why does it try to connect to it if the input is a
a3m
file?
@xvazquezc is should not use the MMseqs2 server if you provide a a3m input. Also our a3m server does not do any folding, it only generates the input MSAs.
Their is also some update on the a3m input for complex prediction (see https://github.com/sokrypton/ColabFold/issues/76). However we still do not have a colabfold_search
integration but you could build your own a3ms if you'd like.
@martin-steinegger that's what I thought. I generated the a3m
through colabfold_search.sh
and then split_msas.py
but colabfold_batch
still tries to connect. I know this is the case because when I run it as an HPC job it gets stuck (jobs can't access Internet) and the error at the end it's all related to urllib3
.. typical URL connection issues. e.g.:
Traceback (most recent call last):
File "/home/561/xc3587/miniconda3/envs/colabfold/bin/colabfold_batch", line 8, in <module>
sys.exit(main())
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/colabfold/batch.py", line 794, in main
download_alphafold_params(is_complex, data_dir)
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/colabfold/download.py", line 29, in download_alphafold_params
response = requests.get(url, stream=True)
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect
conn = self._new_conn()
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
sock.connect(sa)
It's not actually trying to connect the msa server, but it fails downloading the weights for alphafold. You'll have to download them yourself and to put them or symlink them to ~/.cache/colabfold/params/
(or whatever you set your cache dir to). Make sure ~/.cache/colabfold/params/download_finished.txt
also exists, that's the marker that colabfold is looking for
Now that I manually downloaded the weights to avoid issues with the HPC. I tried to continue the processing from the old MSA I had generated with colabfold_search.sh
with colabfold_batch
, but I kept getting this error (see below). I even reinstalled the whole environment, just in case, and I re-generated the MSA with the newer colabfold_search
and colabfold_split_msas
, but once again I've got the same kind of error:
$ colabfold_batch --model-type AlphaFold2-multimer msas_Cat2_B2RPK0-P09429/ predictions_Cat2_B2RPK0-P09429/
f115d2822e32f943fab96db219a0f0ca3729b799
2021-11-30 16:46:38,377 Running colabfold 1.2.0 (f115d2822e32f943fab96db219a0f0ca3729b799)
2021-11-30 16:46:38,387 Found 5 citations for tools or databases
2021-11-30 16:46:47,068 Query 1/1: Cat2_B2RPK0-P09429 (length 427)
2021-11-30 16:46:48,250 Running model_3
Traceback (most recent call last):
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 883, in __getitem__
field = self._fields[key]
KeyError: 'data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 807, in __getattr__
return self[attribute]
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 889, in __getitem__
raise KeyError(self._generate_did_you_mean_message(key, str(e)))
KeyError: "'data'"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/561/xc3587/miniconda3/envs/colabfold/bin/colabfold_batch", line 8, in <module>
sys.exit(main())
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 1228, in main
zip_results=args.zip,
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 1010, in run
stop_at_score=stop_at_score,
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 189, in predict_structure
use_templates,
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 122, in batch_input
eval_cfg = model_config.data.eval
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 809, in __getattr__
raise AttributeError(e)
AttributeError: "'data'"
Any idea what it might be? Thanks
Just to update, I was trying to run a complex, so this error disappears when adding the first line starting with #
as @martin-steinegger indicates in a recent comment in #76. I have a consistent error no matter how I format the a3m
though:
$ colabfold_batch --model-type AlphaFold2-multimer --pair-mode unpaired msas_test/unpaired_mod_trim_ext.a3m predictions_test_unpaired/
f115d2822e32f943fab96db219a0f0ca3729b799
2021-12-01 13:08:30,860 Running colabfold 1.2.0 (f115d2822e32f943fab96db219a0f0ca3729b799)
2021-12-01 13:08:30,877 Found 5 citations for tools or databases
2021-12-01 13:08:39,645 Query 1/1: unpaired_mod_trim_ext (length 426)
2021-12-01 13:08:39,669 Could not get MSA/templates for unpaired_mod_trim_ext: list index out of range
Traceback (most recent call last):
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 951, in run
) = unserialize_msa(a3m_lines, query_sequence)
File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 796, in unserialize_msa
paired_msa[j] += ">" + header_no_faster_split[j] + "\n"
IndexError: list index out of range
colabfold_search
as well as colabfold_batch
supports batch complex predictions. Just provide a fasta or csv fle with your complex sequences. Following is a example.fasta
:
>1
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK:PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK
>2
PIAQIHILEGRSDEQKE:PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK
You can search the databases, build the MSAs and predict the complex structures using the following commands:
colabfold_search example.fasta db msas
colabfold_batch msas predictions
Please update your local MMseqs2 to the newest version (see MMseqs2 repository).
Hello everyone,
I am also trying to run colabfold_batch on a HPC. I am able to run one sequence. However, when I try running two sequences sequentially I get the following errors:
INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED Could not predict try1. Not Enough GPU memory? INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
I am requesting the following resources: --gpu=1 --ntasks-per-node= 1 --nodes=1 --mem=128GB
I am not sure if it is a problem with my software or if I am requesting the incorrect resources.
I am not very familiar with HPCs, so any help would be very much appreciated.
Thank you!
Just to update, I was trying to run a complex, so this error disappears when adding the first line starting with
#
as @martin-steinegger indicates in a recent comment in #76. I have a consistent error no matter how I format thea3m
though:$ colabfold_batch --model-type AlphaFold2-multimer --pair-mode unpaired msas_test/unpaired_mod_trim_ext.a3m predictions_test_unpaired/ f115d2822e32f943fab96db219a0f0ca3729b799 2021-12-01 13:08:30,860 Running colabfold 1.2.0 (f115d2822e32f943fab96db219a0f0ca3729b799) 2021-12-01 13:08:30,877 Found 5 citations for tools or databases 2021-12-01 13:08:39,645 Query 1/1: unpaired_mod_trim_ext (length 426) 2021-12-01 13:08:39,669 Could not get MSA/templates for unpaired_mod_trim_ext: list index out of range Traceback (most recent call last): File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 951, in run ) = unserialize_msa(a3m_lines, query_sequence) File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 796, in unserialize_msa paired_msa[j] += ">" + header_no_faster_split[j] + "\n" IndexError: list index out of range
Now that I manually downloaded the weights to avoid issues with the HPC. I tried to continue the processing from the old MSA I had generated with
colabfold_search.sh
withcolabfold_batch
, but I kept getting this error (see below). I even reinstalled the whole environment, just in case, and I re-generated the MSA with the newercolabfold_search
andcolabfold_split_msas
, but once again I've got the same kind of error:$ colabfold_batch --model-type AlphaFold2-multimer msas_Cat2_B2RPK0-P09429/ predictions_Cat2_B2RPK0-P09429/ f115d2822e32f943fab96db219a0f0ca3729b799 2021-11-30 16:46:38,377 Running colabfold 1.2.0 (f115d2822e32f943fab96db219a0f0ca3729b799) 2021-11-30 16:46:38,387 Found 5 citations for tools or databases 2021-11-30 16:46:47,068 Query 1/1: Cat2_B2RPK0-P09429 (length 427) 2021-11-30 16:46:48,250 Running model_3 Traceback (most recent call last): File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 883, in __getitem__ field = self._fields[key] KeyError: 'data' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 807, in __getattr__ return self[attribute] File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 889, in __getitem__ raise KeyError(self._generate_did_you_mean_message(key, str(e))) KeyError: "'data'" During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/561/xc3587/miniconda3/envs/colabfold/bin/colabfold_batch", line 8, in <module> sys.exit(main()) File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 1228, in main zip_results=args.zip, File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 1010, in run stop_at_score=stop_at_score, File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 189, in predict_structure use_templates, File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/batch.py", line 122, in batch_input eval_cfg = model_config.data.eval File "/home/561/xc3587/miniconda3/envs/colabfold/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py", line 809, in __getattr__ raise AttributeError(e) AttributeError: "'data'"
Any idea what it might be? Thanks
Did you figure this one out? I tried running a single sequence mand it ran the full job, but now am trying multimer_v2 and got the same error:
Traceback (most recent call last): File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/ml_collections/config_dict/con fig_dict.py", line 903, in getitem field = self._fields[key] KeyError: 'data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/ml_collections/config_dict/con fig_dict.py", line 827, in getattr return self[attribute] File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/lib/python3.7/site-packages/ml_collections/config_dict/con fig_dict.py", line 909, in getitem raise KeyError(self._generate_did_you_mean_message(key, str(e))) KeyError: "'data'"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/uufs/chpc.utah.edu/sys/installdir/colabfold/070622/conda/bin/colabfold_batch", line 8, in