cbioportal icon indicating copy to clipboard operation
cbioportal copied to clipboard

Issues with alteration frequency and profiled cases counting in the backend

Open onursumer opened this issue 2 years ago • 2 comments

  1. For genes that are not linked to any gene panel, we use the number of profiled cases for all genes. This does not seem correct, we should only consider the cases profiled for that specific gene or the cases without gene panel data (we consider those cases as profiled for all genes). https://github.com/cBioPortal/cbioportal/blob/47a89c3cd79e4e2f5143cbc0a0b9bd6c3136e942/service/src/main/java/org/cbioportal/service/util/ProfiledCasesCounter.java#L112-L115
    1. This change was intended to solve a divide by zero issue (#9170)
    2. Calculated frequencies for certain genes are most likely incorrect now. We should probably display N/A, or just hide these genes when the number of profiled cases happens to be zero. (example query)
      1. When using only cases without gene panel data as a denominator: FIP1L1_casesWithoutPanelDataAsDenominator
      2. When using all profiled cases as a denominator: FIP1L1_allProfiledAsDenominator
    3. Similarly, using all profiled cases as a denominator causes reporting lower frequencies in certain cases (example query)
      1. Frequencies calculated by oncoprint (frontend implementation): oncoprint_frontendFrequencyCalculation
      2. When using only cases without gene panel data as a denominator (WIP frequency endpoint): pathways_casesWithoutPanelDataAsDenominator
      3. When using all profiled cases as a denominator (WIP frequency endpoint): pathways_allProfiledAsDenominator
  2. When queried with multiple studies, alteration count service may incorrectly report the number of profiled cases. This is because some of the involved studies may not have any alteration for certain genes, and we may end up ignoring all the profiled cases in those studies for a specific gene.
    1. Higher frequencies for certain genes when profiled cases from certain studies are ignored (WIP frequency endpoint): pathways_profiledCasesFromOtherStudiesIgnored
    2. There is a proposed fix for this issue: https://github.com/cBioPortal/cbioportal/pull/9722/files#diff-edc39d652f2b767349007ff1ebec116cfc4963d7f4ef94f78200c2f4cf6ec609R270-R279
  3. When finding out the number of alterations for a specific gene, we don't really verify if an altered case is linked to that gene via gene panels which actually contains that specific gene. If a case has a gene panel but has an alteration in a gene outside of that panel then we should ignore that alteration.

onursumer avatar Sep 28 '22 20:09 onursumer

2. Calculated frequencies for certain genes are most likely incorrect now. We should probably display `N/A`, or just hide these genes when the number of profiled cases happens to be zero.
  • Only include genes that are profiled according to the gene panels
  • For some of these there could be a curation opportunity to add these genes to the panel

inodb avatar Oct 04 '22 15:10 inodb

related: https://github.com/cBioPortal/icebox/issues/177

jjgao avatar Oct 04 '22 15:10 jjgao