ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

Using embeddings (single residue and pairwise)

Open duhovka opened this issue 3 years ago • 2 comments
trafficstars

How can I connect the embedding to the original sequence? For example, for the input sequence of 120 residues I got a single residue embedding of 132x384. Does it include insertions in the MSA? Thanks!

duhovka avatar Jun 08 '22 15:06 duhovka

The first 120 dimension are right. The remaining 12 are a padding. We should have trimmed the representation. To avoid padding you can use --recompile-padding 1.0

martin-steinegger avatar Jun 16 '22 04:06 martin-steinegger

Thank you for pointing out the issue and providing the solution. I also face the same issue here: the dimension of generated single representation is larger than the length of protein fasta sequence. Therefore, I sum up the last dimension of the single representation np.sum(single_representation, 1), and I found that the remaining dimension are the same as you guys point out these dimensions are all padding. I think the fastest way is to get rid of those dimensions.

pykao avatar Jun 16 '22 07:06 pykao