[Feature][framework] Support using StarRocks as the storage/analysis engine

Open klesh opened this issue 3 years ago • 0 comments

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

Description

Why

Currently, users can choose between mysql / pg as the database system to store and analyze collected data, it works fine in most cases. However, it poses some limitations when the dataset grows too large to handle. We had to shut down our demo server twice due to storage/memory overflow, so, we can easily project that it would not be practical if we needed some large-scale analysis with Traditional RDBS (mysql/pg).

What

I propose that we investigate the possibility of adopting Distributed Columnar DBS as our storage, like StarRocks

Pros:

it supports MySQL protocol
it is Distributed, thus unlimited storage / memory and computation
it is Columnar, more efficient in storage and search

Cons:

supports a subset of Standard SQL statement, thus some queries might not be possible
bad at updating (not support?) insertion(insert rows in the middle of the table)? thus we have to redesign our data-collect-update-convert logic

How

By introducing StarRocks, we may have unlimited storage and memory, thus, support for large-scale analysis becomes possible. We may support other kinds of Big-Data.

I propose that we approach with the following steps:

Assign a Veteran Developer to investigate the StarRocks DBS, and evaluate the feasibility of the adoption.
A report should be submitted to the Community within 5 workdays.
The PPMC members should evaluate the report and make a decision in 3 workdays while all Committer could share their thoughts
We will schedule the implementation afterward

Use case

Users may run large-scale analysis in Apache DevLake

Related issues

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Jul 19 '22 08:07 klesh