starrocks
starrocks copied to clipboard
[Good First Issue]StarRocks Hands-on Tasks 2024
Hi Rockstars,
This is a list of proposed Hands-on tasks. If you're new to StarRocks and eager to engage with the community, here are some issues that are well-suited for you to dive into :) These issues are suitable for gaining hands-on experience and becoming familiar with StarRocks development. Also this is an open list, you are welcome to propose more tasks.
Please @kateshaowanjou or @wangsimo0 to book the issue, and add a comment in the issue you picked, so the issue won't be assigned to others. And always discuss with the community about the design before actually developing, some of the issues are really big, don't hesitate to seek help from the community.
External Catalog related issues
Information Schema in External Catalog
In version 3.2 and later, StarRocks enhances compatibility with more BI tools by supporting the information_schema database in External Catalog. This feature serves as a valuable tool for obtaining structured information. While several views within information_schema currently return empty, efforts are underway to optimize support for these views to ensure comprehensive coverage. StarRocks aligns with MySQL's pattern in supporting information_schema, as it follows the MySQL protocol. We better maintain the compatibility with MySQL, provide as much information as we can, and optimize for efficiency to minimize time consumption. consumed.
- [ ] Columns view
- [ ] Views view
Trino's Compatibility Issues
In version 3.0 and later, StarRocks supports Trino's SQL_dialect mode; however, ongoing enhancements are necessary to further optimize this functionality.
New Functions
- [ ] inverse_normal_cdf and normal_cdf @241600489 #38989
- [ ] typeof @MicePilot #36245
- [ ] regexp_split #37089
- [x] boolor_agg,boolxor_agg,booland_agg #22949
- [ ] from_iso8601_date(string),from_iso8601_timestamp(string) #40877
- [ ] array_agg in window function #40881 @mygrsun
- [ ] cardinality in HLL data type #40879
- [ ] count(distinct) window function #46105 @yangzho12138
Function Mapping
Trino's function/expression | StarRocks' function/expression | comment | assginee | |
---|---|---|---|---|
|
map_agg(key, value) → map<K,V> | map() | ||
|
show schemas from <catalog_name> | Show databases from <catalog_name> | ||
|
array_sort(array(T), function(T, T, int)) -> array(T) | array_sortby( |
This one needs to pay attention to the input order. | |
|
sequence(start, stop)sequence(start, stop, step)In integers data type | array_generate([start,] end [, step]) | ||
|
last_day_of_month(x) → date | last_day(x,'month'); | ||
|
map_from_entries(array(row(K, V))) -> map(K, V) | map_from_arrays. | This one needs to pay attention to the transformation. SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); equals to SELECT map_from_arrays([1,2],['x','y']); | |
|
current_catalog | catalog() | thanks to @macroguo-ghy | |
|
current_schema | database() | thanks to @macroguo-ghy | |
|
slice(x, start, length) → array | array_slice(input, offset, length) | ||
|
approx_set(x) → HyperLogLog | HLL_HASH(column_name) | ||
|
empty_approx_set() → HyperLogLog | HLL_EMPTY() | ||
|
merge(HyperLogLog) → HyperLogLog | HLL_RAW_AGG(hll) |
Other Enhancements
- [ ] Apache Ranger's policy translator
StarRocks support using Hive service in Ranger to control access towards hive tables. However we discover there are still some community users want to manage all the privs in StarRocks ranger service. So we need a translator(maybe a script)
- [x] Add catalog information in FE's query_detail @happut
After enabling collect query details using admin set frontend config("enable_collect_query_detail_info"="true") user can get query detail using curl -uroot: http://172.26.81.138:8030/api/query_detail?event_time=<unixtimestamp_value> , the information is like ...."database":"simo","sql":"insert into abc values (1,2),(2,3)","user":"root".... There is no catalog information. Like "catalog":"defaut_catalog"
Apache Hudi & Delta Lake Compatibilities
- [ ] Add Hudi sink (✨ HIGH priority)
- [ ] Add Delta Lake sink (✨ HIGH priority)
More Connectors
- [ ] Oracle catalog
- [ ] Kudu catalog @predator4ann
- [ ] StarRocks catalog
- [ ] Greenplum catalog
- [ ] SQLSever catalog
- [x] Clickhouse catalog
- [ ] Trino catalog
- [ ] DB2 catalog
- [ ] Druid catalog
- [ ] Oceanbase catalog
- [ ] SAP Hana catalog
More Compatibilities
- [ ] Hive UDF compatible
- [ ] Spark SQL compatible structure
- [ ] Hive SQL compatible structure
- [ ] Impala SQL compatible structure
I'd add iceberg tagging and branch query
https://github.com/StarRocks/starrocks/issues/37959
I want to pick #38989 @wangsimo0
I want to pick #40881 @wangsimo0
I want to pick #37089 @wangsimo0
I want to pick #37089 @wangsimo0 You need to also comment under the issue #37089 so I can assign it to you. If you have any issues during the development process, I can introduce you to the relevant discussion group. https://853921.ma3you.cn/articles/b12e90J/
I want to pick #46105 @wangsimo0
@wangsimo0 Hi, I want to add Delta Lake Compatibilities. Has this requirement been resolved?
@wangsimo0 Hi, I want to add Delta Lake Compatibilities. Has this requirement been resolved? Are you referring to the "Add Delta Lake sink" function? There's no one working on it at the moment and it'd be awesome if you are willing to give it a try!😎
Sure thing! I'd be happy to take this on.
@kateshaowanjou @wangsimo0 Can I pick this issue: https://github.com/StarRocks/starrocks/issues/38989 if its not being worked upon by anyone ?
Sure thing! I'd be happy to take this on.
This issue is not the easiest one so feel free to add my WeChat:wanjoushao if you need help!
@kateshaowanjou @wangsimo0 We are migrating from Trino to Starrocks and working on the functions. Can I pick the map_agg issue?
I want to pick this issue #46060 @wangsimo0 @kateshaowanjou