DiaNN icon indicating copy to clipboard operation
DiaNN copied to clipboard

ambiguity in uniprot names identified => too many options & how to interprete this ?

Open ihorrible opened this issue 2 years ago • 3 comments

Hi DIA-NN !

I have a lot of proteins identified with long names containg up to 10 uniprot_ID.

  1. How shoud I interpret this multiple identification?It means that actualy it can be any uniprot_ID from this long list?
  2. Does the order of these uniprot_ID have a sense? (e.g. first uniprot_ID is highly likely the most suitable).
  3. Any suggestions how to deal with this in statistical analysis? Currently I just use first name of Uniprot_ID in the list.

Thank you,

Ivan Снимок экрана 2022-06-04 172621

ihorrible avatar Jun 04 '22 14:06 ihorrible

Hi Ivan,

  1. How does the log look like? Using --relaxed-prot-inf (enabled by default in DIA-NN 1.8.1) will reduce the number of protein IDs. Each protein group means the signal which could come from any of the proteins listed.

  2. Order in the group is not interpretable.

  3. Use gene-level evidence. Can either deal with each 'gene group' as an independent entity (e.g. like a separate protein) or discard gene groups with > 1 genes, unless these genes are clearly paralogues.

Best, Vadim

vdemichev avatar Jun 04 '22 14:06 vdemichev

Hi, I have the experience that using the UniProt databases for DIA-NN search your Output will have these Protein Groups with a lot of Accesions. Using the SwissProt database reduces this problem but also the total ID rate. Kind Regards, Stella

MarST07 avatar Jun 09 '22 14:06 MarST07

With --relaxed-prot-inf (enabled by default in DIA-NN 1.8.1) the number of proteins in each group will be minimal. Yes, even less with a swissprot-only database.

Vadim

vdemichev avatar Jun 09 '22 14:06 vdemichev