cbioportal
cbioportal copied to clipboard
Issues with alteration frequency and profiled cases counting in the backend
- For genes that are not linked to any gene panel, we use the number of profiled cases for all genes. This does not seem correct, we should only consider the cases profiled for that specific gene or the cases without gene panel data (we consider those cases as profiled for all genes). https://github.com/cBioPortal/cbioportal/blob/47a89c3cd79e4e2f5143cbc0a0b9bd6c3136e942/service/src/main/java/org/cbioportal/service/util/ProfiledCasesCounter.java#L112-L115
- This change was intended to solve a divide by zero issue (#9170)
- Calculated frequencies for certain genes are most likely incorrect now. We should probably display
N/A
, or just hide these genes when the number of profiled cases happens to be zero. (example query)- When using only cases without gene panel data as a denominator:
- When using all profiled cases as a denominator:
- When using only cases without gene panel data as a denominator:
- Similarly, using all profiled cases as a denominator causes reporting lower frequencies in certain cases (example query)
- Frequencies calculated by oncoprint (frontend implementation):
- When using only cases without gene panel data as a denominator (WIP frequency endpoint):
- When using all profiled cases as a denominator (WIP frequency endpoint):
- Frequencies calculated by oncoprint (frontend implementation):
- When queried with multiple studies, alteration count service may incorrectly report the number of profiled cases. This is because some of the involved studies may not have any alteration for certain genes, and we may end up ignoring all the profiled cases in those studies for a specific gene.
- Higher frequencies for certain genes when profiled cases from certain studies are ignored (WIP frequency endpoint):
- There is a proposed fix for this issue: https://github.com/cBioPortal/cbioportal/pull/9722/files#diff-edc39d652f2b767349007ff1ebec116cfc4963d7f4ef94f78200c2f4cf6ec609R270-R279
- Higher frequencies for certain genes when profiled cases from certain studies are ignored (WIP frequency endpoint):
- When finding out the number of alterations for a specific gene, we don't really verify if an altered case is linked to that gene via gene panels which actually contains that specific gene. If a case has a gene panel but has an alteration in a gene outside of that panel then we should ignore that alteration.
2. Calculated frequencies for certain genes are most likely incorrect now. We should probably display `N/A`, or just hide these genes when the number of profiled cases happens to be zero.
- Only include genes that are profiled according to the gene panels
- For some of these there could be a curation opportunity to add these genes to the panel
related: https://github.com/cBioPortal/icebox/issues/177