cbioportal
cbioportal copied to clipboard
ClickHouse-Only cBioPortal Demo based on standard schema
This branch adapts the cBioPortal production codebase for direct use with a ClickHouse deployed database. This is to demonstrate and explore the feasibility of eliminating the mySQL dependency entirely as part of our adoption of OLAP based tools.
Describe changes proposed in this pull request:
- mysql database driver replaced with clickhouse java driver (v0.6.0)
- all table names converted to lower case (underscore delimited)
- all field names converted to lower case (underscore delimited)
- long IN(...) clauses contracted by eliminating unneeded whitespace
- debugging of previous development branch attempted at 2024 hackathon (in progress)
Known issues and remaining work
- substantial functionality still needs to be debugged
- the sample counts on the home page
- the results page
- the patient view page
- the alterationCounts endpoints for the study view page
- performance is still unacceptable for the application of large study (genie) filters, with durations of > 10 minutes. This seems to determine that we will need to also deploy additional ClickHouse (RFC-80) strategies for optimizing performance, primarily through the avoidance of passing long sample lists into persistence layer IN (...) clauses
- as part of the rollout of OLAP, new import functionality must be put in place. If the MySql database is to be dropped, a new process is needed for migrating, validating, and importing database updates into ClickHouse and our schema definition (cgds.sql) needs to be reworked, as well as making available new seed databases in the new format.
- documentation needs extensive revision
Checks
- [ ] The commit log is comprehensible. It follows 7 rules of great commit messages. We can fix this during merge by using a squash+merge if necessary
- [ ] Has tests or has a separate issue that describes the types of test that should be created. If no test is included it should explicitly be mentioned in the PR why there is no test.
- [ ] Is this PR adding logic based on one or more clinical attributes? If yes, please make sure validation for this attribute is also present in the data validation / data loading layers (in backend repo) and documented in File-Formats Clinical data section!
- [ ] Make sure your PR has one of the labels defined in https://github.com/cBioPortal/cbioportal/blob/master/.github/release-drafter.yml
Any screenshots or GIFs?
If this is a new visual feature please add a before/after screenshot or gif here with e.g. Giphy CAPTURE or Peek
Notify reviewers
Read our Pull request merging
policy. It can help to figure out who worked on the
file before you. Please use git blame <filename>
to determine that
and notify them either through slack or by assigning them as a reviewer on the PR