trino
trino copied to clipboard
Add StarRocks Connector
StarRocks https://www.starrocks.io/
Re-open it if we think we need it.
Is there a more comprehensive description of why this issue was closed?
Probably because @chenjian2664 no longer needs it?
@hashhar Thanks for the attention, let me open it. Hopefully it can get reviewed
@chenjian2664 have you tried using the MySQL connector? Starrocks is mysql compatible. Unable to test this as I don't have starrocks setup https://docs.starrocks.io/en-us/main/integrations/IDE_integrations/DataGrip
@jakemongaya I tried. StarRocks is a OLAP database system, but it is usual that there will be multi ways to interact with the database, it's worth to add a connector for it.
Starrocks is mysql compatible
I am not sure how much the database can achieve. Even the database can fully make it, but in that way there might loose lots of chances for optimization of StarRocks, some best practice for MySQL may not suitable for the StarRocks.
Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas
Expecting, Is there some progress on this?
+1
Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas
Hi @brunomarram , for now you can set below parameters to StarRocks
in a mysql client:
set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';
set global sql_dialect = 'trino';
Then you will see all tables stored in the default
catalog of StarRocks or Apache Doris
. Recommend to config mysql-starrocks.properties
like this:
connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false
But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: https://github.com/trinodb/trino/pull/17330
Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas
Hi @brunomarram , for now you can set below parameters to
StarRocks
in a mysql client:set global enable_groupby_use_output_alias=true; set global enable_profile = true; set global big_query_profile_threshold = '120s'; set global runtime_profile_report_interval = 60; set global sql_mode='SORT_NULLS_LAST';
Then you will see all tables stored in StarRocks or
Apache Doris
. Recommend to configmysql-starrocks.properties
like this:connector.name=mysql connection-url=jdbc:mysql://starrocks_fe_hostname:9030 connection-user=root connection-password= insert.non-transactional-insert.enabled=true join-pushdown.enabled=true metadata.cache-ttl=10m metadata.cache-missing=true query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN statistics.enabled=false mysql.jdbc.use-information-schema=false
But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: #17330
mysql-starrocks.properties
Hello, thank you very much for providing the configuration to support trino connecting to starrocks. I have configured it according to the above, but I still can’t see the table, only the library. The used trino:430, starrocks:3.1.8, I don’t know about you. What versions can be queried during edge testing? There is another question to ask. There are two catalogs in starrocks, one is default_catalog and the other is hive_catalog. I have given partial database table permissions to the above two catalogs in trino (mysql-starrocks.properties). I only I can see the libraries under default_catalog, but not the rest. Have you ever encountered this situation?
Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas
Hi @brunomarram , for now you can set below parameters to
StarRocks
in a mysql client:set global enable_groupby_use_output_alias=true; set global enable_profile = true; set global big_query_profile_threshold = '120s'; set global runtime_profile_report_interval = 60; set global sql_mode='SORT_NULLS_LAST';
Then you will see all tables stored in StarRocks or
Apache Doris
. Recommend to configmysql-starrocks.properties
like this:connector.name=mysql connection-url=jdbc:mysql://starrocks_fe_hostname:9030 connection-user=root connection-password= insert.non-transactional-insert.enabled=true join-pushdown.enabled=true metadata.cache-ttl=10m metadata.cache-missing=true query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN statistics.enabled=false mysql.jdbc.use-information-schema=false
But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: #17330
mysql-starrocks.properties
Hello, thank you very much for providing the configuration to support trino connecting to starrocks. I have configured it according to the above, but I still can’t see the table, only the library. The used trino:430, starrocks:3.1.8, I don’t know about you. What versions can be queried during edge testing? There is another question to ask. There are two catalogs in starrocks, one is default_catalog and the other is hive_catalog. I have given partial database table permissions to the above two catalogs in trino (mysql-starrocks.properties). I only I can see the libraries under default_catalog, but not the rest. Have you ever encountered this situation?
Hi @A-little-bit-of-data , we use Trino 423 and StarRocks 3.2 (without the storage-compute separation mode).
The key parameter to see tables in StarRocks is the set global enable_groupby_use_output_alias=true;
, it only can see tables in the default
catalog stored inside StarRocks, and cannot see tables in the hive
catalog outside StarRocks for now, because i think that the concept of catalog
itself means a storage engine
, Trino queries other storages instead of other computing engines, so we cannot have such a querying chain: Trino->StarRocks->Hive
.
I understand why you want to have such a querying chain, because StarRocks 3.x (with the storage-compute separation mode) can querying Hive 3X faster than Trino for now(with the C++ X86 AVX2 vectorization SIMD), but please be patient, the Project HummingBird
is working on: https://github.com/trinodb/trino/issues/14237