foldseek icon indicating copy to clipboard operation
foldseek copied to clipboard

Brief description of output columns?

Open rezahay opened this issue 3 years ago • 3 comments

Thanks a lot for this wonderful tool.

Unfortunately, I couldn't find a brief description of the output format. The output I have downloaded misses column headers. So, I don't know which data belongs to which column. I tried to guess some of them (seqident, ...) but without success. Looking at mmseqs2 site didn't help me. It would be very nice if the README contains a brief explanation of the output format by giving an example.

Thanks in advance, Reza

rezahay avatar Jul 24 '22 13:07 rezahay

Essentially its the same format as described here: https://github.com/soedinglab/MMseqs2/wiki#custom-alignment-format-with-convertalis

However, in TMalign mode, evalue and bitscore are both TMscore.

milot-mirdita avatar Jul 24 '22 13:07 milot-mirdita

Thanks for your quick response. The output file contains multiple records such as:

job.pdb AF-P0A1G5-F1-model_v2.pdb.gz 100.000 41 0 0 1 41 111 151 6.180E-05 209 41 151 FGYCESCGVEIGIRRLEARPTADLCIDCKTLAEIREKQMAG FGYCESCGVEIGIRRLEARPTADLCIDCKTLAEIREKQMAG -8.680,37.207,21.983,-9.232,34.623,19.132,-6.462,34.101,16.601,-7.603,30.809,15.047,-4.576,28.764,13.940,-5.145,28.262,10.189,-3.572,24.760,9.805,-2.046,24.647,6.287,-3.639,21.502,4.799,-0.804,20.174,2.623,-2.098,19.840,-0.977,-0.437,16.358,-0.744,-2.991,15.032,1.816,-5.168,12.055,0.679,-8.229,14.155,1.703,-7.154,17.029,-0.617,-6.738,14.552,-3.562,-10.300,13.315,-2.921,-11.776,16.858,-2.471,-13.106,15.725,0.959,-13.102,17.329,4.416,-12.379,15.266,7.589,-15.460,13.246,8.674,-17.324,14.581,11.733,-18.419,11.675,13.972,-21.823,12.190,15.665,-22.359,11.053,19.308,-23.729,7.452,19.214,-23.142,7.014,15.449,-22.756,3.358,14.380,-19.219,2.542,13.198,-18.939,1.957,9.403,-22.452,3.081,8.510,-23.317,3.812,4.828,-22.197,7.491,5.160,-18.755,6.491,6.585,-18.271,3.726,3.953,-19.261,6.191,1.156,-16.721,8.710,2.566,-13.849,6.148,2.541,-14.925,4.772,-0.889,-14.644,8.307,-2.386,-11.183,8.740,-0.782,-9.893,5.333,-2.001,-11.281,5.813,-5.568,-9.837,9.345,-5.923,-6.457,8.189,-4.537,-6.461,5.115,-6.868,-7.240,7.327,-9.902,-4.496,9.874,-9.011,-1.935,7.021,-8.604,-3.012,5.583,-12.022,-2.537,9.000,-13.687,0.937,9.259,-12.024,1.974,5.756,-13.222,0.781,6.588,-16.794,2.755,9.918,-16.767,5.923,8.166,-15.485,5.587,5.492,-18.234,5.355,8.289,-20.866,8.410,10.084,-19.351,10.471,6.837,-19.320,9.526,6.237,-23.019,10.674,9.768,-23.993,13.970,9.296,-22.043,14.552,5.890,-23.729,14.081,7.551,-27.179,16.871,10.089,-26.342,20.397,8.609,-26.559,22.538,10.284,-23.830,26.093,11.471,-24.436,28.741,9.017,-23.017,29.348,11.410,-20.035,25.611,11.712,-19.111,24.674,8.005,-19.650,25.500,7.129,-16.010,23.222,9.906,-14.623,20.299,8.964,-16.941,20.655,5.226,-16.038,20.564,6.145,-12.292,17.423,8.340,-12.759,15.674,5.659,-14.873,16.499,3.030,-12.184,14.901,5.238,-9.461,11.765,5.914,-11.579,11.409,2.168,-12.325,11.625,1.308,-8.563,9.124,4.051,-7.585,6.617,2.987,-10.305,6.785,-0.635,-9.054,5.975,0.634,-5.478,3.020,2.718,-6.804,1.664,-0.340,-8.715,1.858,-2.375,-5.450,0.071,0.497,-3.635,-2.718,0.600,-6.324,-3.229,-3.189,-5.826,-3.374,-2.560,-2.040,-6.021,0.201,-2.521,-8.038,-2.157,-4.813,-7.878,-4.836,-2.048,-9.064,-2.180,0.473,-11.933,-1.097,-1.872,-13.115,-4.756,-1.872,-13.175,-4.679,1.983,-15.395,-1.541,1.747,-17.739,-3.405,-0.686,-17.759,-6.549,1.556,-18.621,-4.391,4.691,-15.350,-5.688,6.398,-13.648,-2.244,6.349,-12.303,-0.342,9.393,-11.420,-3.264,11.743,-7.962,-4.467,12.865,-7.042,-7.791,11.130,-5.113,-8.897,14.309,-7.547,-8.106,17.184,-10.928,-7.069,15.620,-10.768,-3.587,17.288,-12.137,-0.580,15.343,-9.541,1.600,13.594,-10.333,5.234,14.496,-11.820,7.288,11.593,-9.061,9.977,11.675,-6.457,7.168,11.348,-8.237,5.713,8.283,-8.433,9.269,6.786,-4.662,9.639,7.270,-4.089,6.109,5.827,-7.205,4.359,4.389,-5.180,1.177,3.547,-4.450,0.531,7.282,-4.979,-3.180,8.179,-3.971,-3.081,11.920,-4.844,-0.869,14.960,-2.065,1.455,16.299,-1.059,-1.033,19.055,-0.890,-4.116,16.769,1.002,-2.040,14.139,3.526,-0.835,16.783,4.024,-4.419,18.071,4.538,-5.640,14.462,7.248,-2.943,13.886,9.028,-3.871,17.181,8.992,-7.580,16.126,10.255,-6.666,12.609,13.282,-4.810,14.133,14.053,-7.781,16.454,13.768,-10.443,13.673,15.546,-8.511,10.851,18.808,-7.977,12.855 MQEGQNRKTSSLSILAIAGVEPYQEKPGEEYMNEAQLSHFKRILEAWRNQLRDEVDRTVTHMQDEAANFPDPVDRAAQEEEFSLELRNRDRERKLIKKIEKTLKKVEDEDFGYCESCGVEIGIRRLEARPTADLCIDCKTLAEIREKQMAG 99287 Salmonella enterica subsp. enterica serovar Typhimurium str. LT2

which seems to be different than: targetID alnScore seqIdentity eVal qStart qEnd qLen tStart tEnd tLen [queryOrfStart] [queryOrfEnd] [dbOrfStart] [dbOrfEnd] [alnCigar]

I know these columns (of mmseqs2) but the above record is different. Would you please give me a hint how should I interpret the columns of the above record?

rezahay avatar Jul 24 '22 14:07 rezahay

The api m8 has additional columns to enable all the stuff in the web visualization: The first twelve are the normal output: query,target,pident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits Then we add the sequence lengths, alignments and target C-alpha cooridnates: qlen,tlen,qaln,taln,tca,tseq And if a db with taxonomy information was searched we add also this information: taxid,taxname

milot-mirdita avatar Jul 24 '22 16:07 milot-mirdita

@milot-mirdita it seems like there's a missing column between tend and evalue, as there are 21 columns. Do you have insight on what it may be?

ronboger avatar Mar 02 '23 20:03 ronboger

See my answer in the other thread:

We recently added the Foldseek match probability (prob). Also we return pident not fident and since recently for some of the databases theader instead of target to get the full header.

milot-mirdita avatar Mar 03 '23 08:03 milot-mirdita