DiaNN
DiaNN copied to clipboard
Confusion about "Protein.Group" and "Protein.Ids" column
Hello Vadim, I am puzzled a litte. Looking at the main output from DIANN. (report.tsv)
I always thought that in the "Protein.Group" column you are listing the "winner protein(s) or the protein-group" while not parsimony principle there is a logic behind which are the "winner proteins" -> taken all peptides for a group of proteins in consideration. While in "Protein.Ids" column you would report all proteins where the peptide under question would appear.
Now in one of my searches I do have a fasta file where I have two completely identical protein sequences (identical sequence and length) but with different accessions and header lines. (I agree that this does not make too much sense but still it should not be problematic and can even be real that there are two identical proteins encoded at different loci) I would expect that both of these proteins are listed in the "Protein.Group" column for all the identified peptides for these proteins.
Here I find only one protein in Protein.Ids (while still in this column sometimes there are two proteins separated by semicolon?) WP_014262366.1
While in Protein.Ids: ADW16141.1;WP_014262366.1
Any explanation for this behaviour?
Best regards jonas
From report.tsv output: File.Name Run Protein.Group Protein.Ids Protein.Names Genes PG.Quantity PG.Normalised PG.MaxLFQ Genes.Quantity Genes.Normalised Genes.MaxLFQ Genes.MaxLFQ.Unique Modified.Sequence /scratch/DIANN_A314/WU305725/20240709_C35673_003r_S722096_Fa1_2_Group_1.mzML 20240709_C35673_003r_S722096_Fa1_2_Group_1 WP_014262366.1 ADW16141.1;WP_014262366.1 hypothetical hypothetical 82334.9 82334.9 74703.5 2.4349e+07 2.4349e+07 1.54694e+07 DIFNFISR /scratch/DIANN_A314/WU305725/20240709_C35673_003r_S722096_Fa1_2_Group_1.mzML 20240709_C35673_003r_S722096_Fa1_2_Group_1 WP_014262366.1 ADW16141.1;WP_014262366.1 hypothetical hypothetical 82334.9 82334.9 74703.5 2.4349e+07 2.4349e+07 1.54694e+07 IAYDLILTSK /scratch/DIANN_A314/WU305725/20240709_C35673_003r_S722096_Fa1_2_Group_1.mzML 20240709_C35673_003r_S722096_Fa1_2_Group_1 WP_014262366.1 ADW16141.1;WP_014262366.1 hypothetical hypothetical 82334.9 82334.9 74703.5 2.4349e+07 2.4349e+07 1.54694e+07 IFAGAGNDR /scratch/DIANN_A314/WU305725/20240709_C35673_003r_S722096_Fa1_2_Group_1.mzML 20240709_C35673_003r_S722096_Fa1_2_Group_1 WP_014262366.1 ADW16141.1;WP_014262366.1 hypothetical hypothetical 82334.9 82334.9 74703.5 2.4349e+07 2.4349e+07 1.54694e+07 IFAITNNDLGEDVEK /scratch/DIANN_A314/WU305725/20240709_C35673_011r_S722097_Fa1_3_Group_1.mzML 20240709_C35673_011r_S722097_Fa1_3_Group_1 WP_014262366.1 ADW16141.1;WP_014262366.1 hypothetical hypothetical 47893.8 47893.8 52838.7 1.27546e+07 1.27546e+07 1.53814e+07 DIFNFISR /scratch/DIANN_A314/WU305725/20240709_C35673_011r_S722097_Fa1_3_Group_1.mzML 20240709_C35673_011r_S722097_Fa1_3_Group_1 WP_014262366.1 ADW16141.1;WP_014262366.1 hypothetical hypothetical 47893.8 47893.8 52838.7 1.27546e+07 1.27546e+07 1.53814e+07 IAYDLILTSK /scratch/DIANN_A314/WU305725/20240709_C35673_011r_S722097_Fa1_3_Group_1.mzML 20240709_C35673_011r_S722097_Fa1_3_Group_1 WP_014262366.1 ADW16141.1;WP_014262366.1 hypothetical hypothetical 47893.8 47893.8 52838.7 1.27546e+07 1.27546e+07 1.53814e+07 IFAITNNDLGEDVEK
According fasta entries:
WP_014262366.1 hypothetical protein [Filifactor alocis] MKELSLEMKERIVEEILNEFRQGNAINGTFLIKKIFAITNNDLGEDVEKIIGMLEDSLDNSIYYIMETYS EQLLTPGEYEAVNELQDFIDAVNYVKIAYDLILTSKDCMELKKMEKDFLELEKECKKDPDNQEKYELLQR QRSEKIKKENDIVGTLLGIPISFIKKFPGFGTYAACTLEGGLLCLDKGTELLIRHREQLEDSLRDILKGT GIIISLDAESSREIFRQRDIFNFISRSKLSKENKRMQESQEKFKKAWDDMKKVAGKVGEKIKDIVKGTEG KINDNGVEKIKDAQGDVTRPYDPLVVDLNHNGFDLHSVENGVYFDLDNNGTKEKTSWVDKQDGFLVMDQN GNGKIDTGAELFGEKVLLKNGQYSDGAIDVLSEFDENGDGIIDDKDSVFDKLMIWQDLNHNGISEEGELK TLKEHHIVGLKLTDIQSHQRNIAGSTLRKSMTYIYEETVTNEKGEEIKSQKEGTIGEFLLAKDNIDTHDT EQGQDMLSQLDLSKEDEAALYHTVKNLPDIRSFGRFKRLHNAMVLDKSGVLVGLVQQFQNSKNSAERENL LEQILLFMADATDVDASTKGQYANAKHIKVLEHVFDNPLREGVLDKRLGQTYEDAYHDIKSVYYTTLSMQ TSLKDMQEFFLSKENKLINIALLNKYLELQLLQDKENSEFLFEETTKVLMYLDSLGIEGFEQFKHHFGSL STRYFHKFAELNVKQYRENTDNTTTYLYTKGVTIHAGDGNNIIQGTFNGPSGNDYIYAGKGNDMIYGGTG SDTYFFEKGDGNDTIKESINAKDTNIVVFGKGIKKENLQIRRLNHHDVKITIKGTDDSLTIQGQIQNNEK GAIDEFFFFDGERMTYKELKESANQITTGDDFIETTNDNDEIDLLSGDDTVYTKSGKDTIHGNAGADTIY AGEGDDILYGDEGSDKLYGENGEDTLIGGTGDDYLNGGYGADTYLFHKGDGIDTIEEYDYNTQNIDKIQL DKDIKKEDIILNRKGNDLEITFKAGNDKIIVKNQFANANSTIEQIVYGNNQIIAFQEMLDTTNQNSIKEH LLQGAYTDDILVGSQESDTIHGYDGQDQITGGKGDDILDGGYGNDTYYYNKGDGSDIITDYSGNNTLILG EGISKDKVVFTRVSREDIVMSIVGTEDKITIKNQWNNRTIDKVQFHDGSSLTYDQIKSIVNTPTDRDDYL EGTNGADILEGGKGNDHLNGGYGGDTYVFSRGDGQDTIEDYSGGYEGVDKLIFKDINREDVIFSRESEKD ITILVKNSNDKIKIKYGNNPYHAIEEIHFANGEVMTYEEMMKQPFEYYGDEQDNTINTYSTDDKIFAGAG NDRIHAGDGNNIVYGGEGNDEIRSGSGNDILEGGKGNDHLNGGYGGDTYVFSRGDGQDTIEDYSGGYEGV DKLIFKDINREDVIFSRESEKDITILVKNSNDKIKIKYGNNPYHAIEEIHFANGEVMTYEEMMKQPFEYY GDEQDNTINTYSTDDKIFAGAGNDRIHAGDGNNIVYGGEGNDEIRSGRGNDTLVGGKGNDYLQGYYGADT YIFSRGDGQDIVDENNSDNSHSVVDKIVFTDINREDVIFTKENNSDVTIKVKGSEDKVTIKNAHSNDWQI EEIHFANGEVMTYEEMMKQPFEYYGDEKDNTINTYSTDDKIFAGAGNDRIHAGDGNNIVYGGEGNDEIRS GRGNDTLVGGKGNDYLQGYYGADTYIFSRGDGQDIVDENNSDNSHSVVDKIVFTDINREDVIFTKENNSD VTIKVKGSEDKVTIKNAHSNDWQIEEIHFANGEVMTYEEMMKQPFEYYGDEKDNTINTYSTDDKIFAGAG NDRIHAGDGNNIVYGGEGNDEIRSGRGNDILEGGKGNDYLNGGYGGDTYIFHKGDGNDTIFDENGSQDKV ITASDMLHTIFEKDGNDMRMTIAGREDSVTVKNWYSSDSYKIEEFHGEEKSMITSRQIDLLIQAMASFSQ EKGISWSKAIEERPTEVEAVVQNFWAKQM
ADW16141.1 type I secretion target GGXGXDXXX repeat (2 copies) [Filifactor alocis ATCC 35896] MKELSLEMKERIVEEILNEFRQGNAINGTFLIKKIFAITNNDLGEDVEKIIGMLEDSLDNSIYYIMETYS EQLLTPGEYEAVNELQDFIDAVNYVKIAYDLILTSKDCMELKKMEKDFLELEKECKKDPDNQEKYELLQR QRSEKIKKENDIVGTLLGIPISFIKKFPGFGTYAACTLEGGLLCLDKGTELLIRHREQLEDSLRDILKGT GIIISLDAESSREIFRQRDIFNFISRSKLSKENKRMQESQEKFKKAWDDMKKVAGKVGEKIKDIVKGTEG KINDNGVEKIKDAQGDVTRPYDPLVVDLNHNGFDLHSVENGVYFDLDNNGTKEKTSWVDKQDGFLVMDQN GNGKIDTGAELFGEKVLLKNGQYSDGAIDVLSEFDENGDGIIDDKDSVFDKLMIWQDLNHNGISEEGELK TLKEHHIVGLKLTDIQSHQRNIAGSTLRKSMTYIYEETVTNEKGEEIKSQKEGTIGEFLLAKDNIDTHDT EQGQDMLSQLDLSKEDEAALYHTVKNLPDIRSFGRFKRLHNAMVLDKSGVLVGLVQQFQNSKNSAERENL LEQILLFMADATDVDASTKGQYANAKHIKVLEHVFDNPLREGVLDKRLGQTYEDAYHDIKSVYYTTLSMQ TSLKDMQEFFLSKENKLINIALLNKYLELQLLQDKENSEFLFEETTKVLMYLDSLGIEGFEQFKHHFGSL STRYFHKFAELNVKQYRENTDNTTTYLYTKGVTIHAGDGNNIIQGTFNGPSGNDYIYAGKGNDMIYGGTG SDTYFFEKGDGNDTIKESINAKDTNIVVFGKGIKKENLQIRRLNHHDVKITIKGTDDSLTIQGQIQNNEK GAIDEFFFFDGERMTYKELKESANQITTGDDFIETTNDNDEIDLLSGDDTVYTKSGKDTIHGNAGADTIY AGEGDDILYGDEGSDKLYGENGEDTLIGGTGDDYLNGGYGADTYLFHKGDGIDTIEEYDYNTQNIDKIQL DKDIKKEDIILNRKGNDLEITFKAGNDKIIVKNQFANANSTIEQIVYGNNQIIAFQEMLDTTNQNSIKEH LLQGAYTDDILVGSQESDTIHGYDGQDQITGGKGDDILDGGYGNDTYYYNKGDGSDIITDYSGNNTLILG EGISKDKVVFTRVSREDIVMSIVGTEDKITIKNQWNNRTIDKVQFHDGSSLTYDQIKSIVNTPTDRDDYL EGTNGADILEGGKGNDHLNGGYGGDTYVFSRGDGQDTIEDYSGGYEGVDKLIFKDINREDVIFSRESEKD ITILVKNSNDKIKIKYGNNPYHAIEEIHFANGEVMTYEEMMKQPFEYYGDEQDNTINTYSTDDKIFAGAG NDRIHAGDGNNIVYGGEGNDEIRSGSGNDILEGGKGNDHLNGGYGGDTYVFSRGDGQDTIEDYSGGYEGV DKLIFKDINREDVIFSRESEKDITILVKNSNDKIKIKYGNNPYHAIEEIHFANGEVMTYEEMMKQPFEYY GDEQDNTINTYSTDDKIFAGAGNDRIHAGDGNNIVYGGEGNDEIRSGRGNDTLVGGKGNDYLQGYYGADT YIFSRGDGQDIVDENNSDNSHSVVDKIVFTDINREDVIFTKENNSDVTIKVKGSEDKVTIKNAHSNDWQI EEIHFANGEVMTYEEMMKQPFEYYGDEKDNTINTYSTDDKIFAGAGNDRIHAGDGNNIVYGGEGNDEIRS GRGNDTLVGGKGNDYLQGYYGADTYIFSRGDGQDIVDENNSDNSHSVVDKIVFTDINREDVIFTKENNSD VTIKVKGSEDKVTIKNAHSNDWQIEEIHFANGEVMTYEEMMKQPFEYYGDEKDNTINTYSTDDKIFAGAG NDRIHAGDGNNIVYGGEGNDEIRSGRGNDILEGGKGNDYLNGGYGGDTYIFHKGDGNDTIFDENGSQDKV ITASDMLHTIFEKDGNDMRMTIAGREDSVTVKNWYSSDSYKIEEFHGEEKSMITSRQIDLLIQAMASFSQ EKGISWSKAIEERPTEVEAVVQNFWAKQM