ColabFold
ColabFold copied to clipboard
inquiry regarding a3m input format
Hello, I wonder what is the correct format of a3m input for complex.
I have succeeded in using an a3m file as an input for monomer prediction, both in the local version of ColabFold and AF2_batch notebook. Now I want to predict a heterodimer and I have a3m files for each of them. I tried to combine them in one file and it returns an error
2022-08-05 11:23:46,076 Could not generate input features 35_1: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part. Traceback (most recent call last): File "/home/fbsb2/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 1350, in run (input_features, domain_names) = generate_input_feature( File "/home/fbsb2/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 1017, in generate_input_feature feature_dict = build_monomer_feature( File "/home/fbsb2/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 871, in build_monomer_feature **pipeline.make_msa_features([msa]), File "/home/fbsb2/miniconda3/envs/colabfold/lib/python3.9/site-packages/alphafold/data/pipeline.py", line 79, in make_msa_features features['deletion_matrix_int'] = np.array(deletion_matrix, dtype=np.int32) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part. 2022-08-05 11:23:46,077 Done
the combined a3m file looks like
name1 sequence1 hits name2 sequence2 hits
Thank you in advance.
Possibly related, I had a slightly different error due to multimer and "features". Namely, in my hacky setup I run the colabfold.batch.run
command in a location with internet asking for zero models: this generate an a3m alignment and then stumbles on the zero, which triggers the submission to a node without internet access, wherein colabfold.batch.run
is called but with the generated a3m as a custom MSA —I will not comment on my current SGE priority level 🤣
This works with pair_mode="unpaired+paired"
argument, but does not with a pair_mode="paired"
. Bizarrely, MMseqs2 generated A3M file with pair_mode="paired"
resubmitted for AlphaFold inference with pair_mode="unpaired+paired"
will crash.
An easy fix is to change line c.1017 in colabfold.batch.generate_input_feature
:
if unpaired_msa is None or unpaired_msa[sequence_index] == '':
input_msa = ">" + str(101 + sequence_index) + "\n" + sequence
else:
input_msa = unpaired_msa[sequence_index]
Otherwise, the colabfold.batch.build_monomer_feature
call will send alphafold.data.pipeline.make_msa_features
a blank MSA (wherein alphafold
module is deffo the drop-in replacement from the alphafold-colabfold
package from steineggerlab/alphafold repo as opposed to Google's one).
In writing this, I actually have input_msa = ">" + '\t'.join([101+i for i range(query_seqs_cardinality)]) + "\n" + sequence
thus remaking the first line, but I believe there's no difference —assuming the first line comment is correct.