trino icon indicating copy to clipboard operation
trino copied to clipboard

Add StarRocks Connector

Open chenjian2664 opened this issue 1 year ago • 12 comments

StarRocks https://www.starrocks.io/

chenjian2664 avatar May 03 '23 08:05 chenjian2664

Re-open it if we think we need it.

chenjian2664 avatar May 20 '23 03:05 chenjian2664

Is there a more comprehensive description of why this issue was closed?

ukclivecox avatar Sep 06 '23 06:09 ukclivecox

Probably because @chenjian2664 no longer needs it?

hashhar avatar Sep 06 '23 06:09 hashhar

@hashhar Thanks for the attention, let me open it. Hopefully it can get reviewed

chenjian2664 avatar Sep 07 '23 08:09 chenjian2664

@chenjian2664 have you tried using the MySQL connector? Starrocks is mysql compatible. Unable to test this as I don't have starrocks setup https://docs.starrocks.io/en-us/main/integrations/IDE_integrations/DataGrip

jakemongaya avatar Sep 15 '23 06:09 jakemongaya

@jakemongaya I tried. StarRocks is a OLAP database system, but it is usual that there will be multi ways to interact with the database, it's worth to add a connector for it.

Starrocks is mysql compatible

I am not sure how much the database can achieve. Even the database can fully make it, but in that way there might loose lots of chances for optimization of StarRocks, some best practice for MySQL may not suitable for the StarRocks.

chenjian2664 avatar Sep 26 '23 13:09 chenjian2664

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

brunomarram avatar Nov 01 '23 20:11 brunomarram

Expecting, Is there some progress on this?

Wahno avatar Dec 07 '23 03:12 Wahno

+1

YuriyGavrilov avatar Dec 14 '23 18:12 YuriyGavrilov

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';
set global sql_dialect = 'trino';

Then you will see all tables stored in the default catalog of StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: https://github.com/trinodb/trino/pull/17330

hackeryang avatar Dec 26 '23 06:12 hackeryang

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';

Then you will see all tables stored in StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: #17330

mysql-starrocks.properties

Hello, thank you very much for providing the configuration to support trino connecting to starrocks. I have configured it according to the above, but I still can’t see the table, only the library. The used trino:430, starrocks:3.1.8, I don’t know about you. What versions can be queried during edge testing? There is another question to ask. There are two catalogs in starrocks, one is default_catalog and the other is hive_catalog. I have given partial database table permissions to the above two catalogs in trino (mysql-starrocks.properties). I only I can see the libraries under default_catalog, but not the rest. Have you ever encountered this situation?

A-little-bit-of-data avatar Apr 29 '24 03:04 A-little-bit-of-data

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';

Then you will see all tables stored in StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: #17330

mysql-starrocks.properties

Hello, thank you very much for providing the configuration to support trino connecting to starrocks. I have configured it according to the above, but I still can’t see the table, only the library. The used trino:430, starrocks:3.1.8, I don’t know about you. What versions can be queried during edge testing? There is another question to ask. There are two catalogs in starrocks, one is default_catalog and the other is hive_catalog. I have given partial database table permissions to the above two catalogs in trino (mysql-starrocks.properties). I only I can see the libraries under default_catalog, but not the rest. Have you ever encountered this situation?

Hi @A-little-bit-of-data , we use Trino 423 and StarRocks 3.2 (without the storage-compute separation mode).

The key parameter to see tables in StarRocks is the set global enable_groupby_use_output_alias=true;, it only can see tables in the default catalog stored inside StarRocks, and cannot see tables in the hive catalog outside StarRocks for now, because i think that the concept of catalog itself means a storage engine, Trino queries other storages instead of other computing engines, so we cannot have such a querying chain: Trino->StarRocks->Hive.

I understand why you want to have such a querying chain, because StarRocks 3.x (with the storage-compute separation mode) can querying Hive 3X faster than Trino for now(with the C++ X86 AVX2 vectorization SIMD), but please be patient, the Project HummingBird is working on: https://github.com/trinodb/trino/issues/14237

hackeryang avatar May 07 '24 09:05 hackeryang