[Feature][framework] Support using StarRocks as the storage/analysis engine
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
Description
Why
Currently, users can choose between mysql / pg as the database system to store and analyze collected data, it works fine in most cases. However, it poses some limitations when the dataset grows too large to handle. We had to shut down our demo server twice due to storage/memory overflow, so, we can easily project that it would not be practical if we needed some large-scale analysis with Traditional RDBS (mysql/pg).
What
I propose that we investigate the possibility of adopting Distributed Columnar DBS as our storage, like StarRocks
Pros:
- it supports MySQL protocol
- it is Distributed, thus unlimited storage / memory and computation
- it is Columnar, more efficient in storage and search
Cons:
- supports a subset of Standard SQL statement, thus some queries might not be possible
- bad at updating (not support?) insertion(insert rows in the middle of the table)? thus we have to redesign our data-collect-update-convert logic
How
By introducing StarRocks, we may have unlimited storage and memory, thus, support for large-scale analysis becomes possible. We may support other kinds of Big-Data.
I propose that we approach with the following steps:
- Assign a Veteran Developer to investigate the StarRocks DBS, and evaluate the feasibility of the adoption.
- A report should be submitted to the Community within 5 workdays.
- The PPMC members should evaluate the report and make a decision in 3 workdays while all Committer could share their thoughts
- We will schedule the implementation afterward
Use case
Users may run large-scale analysis in Apache DevLake
Related issues
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct