ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

inquiry regarding a3m input format

Open zjq1011 opened this issue 2 years ago • 1 comments

Hello, I wonder what is the correct format of a3m input for complex.

I have succeeded in using an a3m file as an input for monomer prediction, both in the local version of ColabFold and AF2_batch notebook. Now I want to predict a heterodimer and I have a3m files for each of them. I tried to combine them in one file and it returns an error

2022-08-05 11:23:46,076 Could not generate input features 35_1: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part. Traceback (most recent call last): File "/home/fbsb2/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 1350, in run (input_features, domain_names) = generate_input_feature( File "/home/fbsb2/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 1017, in generate_input_feature feature_dict = build_monomer_feature( File "/home/fbsb2/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 871, in build_monomer_feature **pipeline.make_msa_features([msa]), File "/home/fbsb2/miniconda3/envs/colabfold/lib/python3.9/site-packages/alphafold/data/pipeline.py", line 79, in make_msa_features features['deletion_matrix_int'] = np.array(deletion_matrix, dtype=np.int32) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part. 2022-08-05 11:23:46,077 Done

the combined a3m file looks like

name1 sequence1 hits name2 sequence2 hits

Thank you in advance.

zjq1011 avatar Aug 05 '22 04:08 zjq1011

Possibly related, I had a slightly different error due to multimer and "features". Namely, in my hacky setup I run the colabfold.batch.run command in a location with internet asking for zero models: this generate an a3m alignment and then stumbles on the zero, which triggers the submission to a node without internet access, wherein colabfold.batch.run is called but with the generated a3m as a custom MSA —I will not comment on my current SGE priority level 🤣 This works with pair_mode="unpaired+paired" argument, but does not with a pair_mode="paired". Bizarrely, MMseqs2 generated A3M file with pair_mode="paired" resubmitted for AlphaFold inference with pair_mode="unpaired+paired" will crash.

An easy fix is to change line c.1017 in colabfold.batch.generate_input_feature:

if unpaired_msa is None or unpaired_msa[sequence_index] == '':
    input_msa = ">" + str(101 + sequence_index) + "\n" + sequence
else:
    input_msa = unpaired_msa[sequence_index]

Otherwise, the colabfold.batch.build_monomer_feature call will send alphafold.data.pipeline.make_msa_features a blank MSA (wherein alphafold module is deffo the drop-in replacement from the alphafold-colabfold package from steineggerlab/alphafold repo as opposed to Google's one).

In writing this, I actually have input_msa = ">" + '\t'.join([101+i for i range(query_seqs_cardinality)]) + "\n" + sequence thus remaking the first line, but I believe there's no difference —assuming the first line comment is correct.

matteoferla avatar Aug 09 '22 19:08 matteoferla