DiaNN
DiaNN copied to clipboard
ambiguity in uniprot names identified => too many options & how to interprete this ?
Hi DIA-NN !
I have a lot of proteins identified with long names containg up to 10 uniprot_ID.
- How shoud I interpret this multiple identification?It means that actualy it can be any uniprot_ID from this long list?
- Does the order of these uniprot_ID have a sense? (e.g. first uniprot_ID is highly likely the most suitable).
- Any suggestions how to deal with this in statistical analysis? Currently I just use first name of Uniprot_ID in the list.
Thank you,
Ivan
Hi Ivan,
-
How does the log look like? Using --relaxed-prot-inf (enabled by default in DIA-NN 1.8.1) will reduce the number of protein IDs. Each protein group means the signal which could come from any of the proteins listed.
-
Order in the group is not interpretable.
-
Use gene-level evidence. Can either deal with each 'gene group' as an independent entity (e.g. like a separate protein) or discard gene groups with > 1 genes, unless these genes are clearly paralogues.
Best, Vadim
Hi, I have the experience that using the UniProt databases for DIA-NN search your Output will have these Protein Groups with a lot of Accesions. Using the SwissProt database reduces this problem but also the total ID rate. Kind Regards, Stella
With --relaxed-prot-inf (enabled by default in DIA-NN 1.8.1) the number of proteins in each group will be minimal. Yes, even less with a swissprot-only database.
Vadim