incubator-gluten
incubator-gluten copied to clipboard
[CH-158] Support functions for clickhouse backend: lower/upper/ltrim/rtrim
What changes were proposed in this pull request?
Close https://github.com/Kyligence/ClickHouse/issues/158 Related PR: https://github.com/Kyligence/ClickHouse/pull/117
Support more functions
- lower
- upper
- ltrim(when trimStr is None)
- rtrim(when trimStr is None)
How was this patch tested?
upper:
0: jdbc:hive2://localhost:10000> explain formatted select upper(l_comment) from lineitem where l_comment like '%sly%' limit 10;
+----------------------------------------------------+
| plan |
+----------------------------------------------------+
| == Physical Plan ==
CollectLimit (6)
+- * Project (5)
+- BlockNativeColumnarToRowExec (4)
+- * FilterExecTransformer (2)
+- Scan parquet (1)
(1) Scan parquet
Output [1]: [l_comment#15]
Batched: true
Location: InMemoryFileIndex [file:/data1/liyang/cppproject/gluten/jvm/src/test/resources/tpch-data/lineitem]
PushedFilters: [IsNotNull(l_comment), StringContains(l_comment,sly)]
ReadSchema: struct<l_comment:string>
(2) FilterExecTransformer
Input [1]: [l_comment#15]
Arguments: (isnotnull(l_comment#15) AND Contains(l_comment#15, sly))
(3) WholeStageCodegenTransformer (1)
Input [1]: [l_comment#15]
(4) BlockNativeColumnarToRowExec
Input [1]: [l_comment#15]
(5) Project [codegen id : 1]
Output [1]: [upper(l_comment#15) AS upper(l_comment)#46]
Input [1]: [l_comment#15]
(6) CollectLimit
Input [1]: [upper(l_comment)#46]
Arguments: 10
lower
0: jdbc:hive2://localhost:10000> explain formatted select lower(l_comment) from lineitem where l_comment like '%sly%' limit 10
. . . . . . . . . . . . . . . .> ;
+----------------------------------------------------+
| plan |
+----------------------------------------------------+
| == Physical Plan ==
CollectLimit (6)
+- BlockNativeColumnarToRowExec (5)
+- * ProjectExecTransformer (3)
+- * FilterExecTransformer (2)
+- Scan parquet (1)
(1) Scan parquet
Output [1]: [l_comment#15]
Batched: true
Location: InMemoryFileIndex [file:/data1/liyang/cppproject/gluten/jvm/src/test/resources/tpch-data/lineitem]
PushedFilters: [IsNotNull(l_comment), StringContains(l_comment,sly)]
ReadSchema: struct<l_comment:string>
(2) FilterExecTransformer
Input [1]: [l_comment#15]
Arguments: (isnotnull(l_comment#15) AND Contains(l_comment#15, sly))
(3) ProjectExecTransformer
Input [1]: [l_comment#15]
Arguments: [lower(l_comment#15) AS lower(l_comment)#38]
(4) WholeStageCodegenTransformer (1)
Input [1]: [lower(l_comment)#38]
(5) BlockNativeColumnarToRowExec
Input [1]: [lower(l_comment)#38]
(6) CollectLimit
Input [1]: [lower(l_comment)#38]
Arguments: 10
ltrim:
0: jdbc:hive2://localhost:10000> explain formatted select ltrim(l_comment) from lineitem where l_comment like '%sly%'
. . . . . . . . . . . . . . . .> ;
+----------------------------------------------------+
| plan |
+----------------------------------------------------+
| == Physical Plan ==
CollectLimit (6)
+- * Project (5)
+- BlockNativeColumnarToRowExec (4)
+- * FilterExecTransformer (2)
+- Scan parquet (1)
(1) Scan parquet
Output [1]: [l_comment#224]
Batched: true
Location: InMemoryFileIndex [file:/data1/liyang/cppproject/gluten/jvm/src/test/resources/tpch-data/lineitem]
PushedFilters: [IsNotNull(l_comment), StringContains(l_comment,sly)]
ReadSchema: struct<l_comment:string>
(2) FilterExecTransformer
Input [1]: [l_comment#224]
Arguments: (isnotnull(l_comment#224) AND Contains(l_comment#224, sly))
(3) WholeStageCodegenTransformer (1)
Input [1]: [l_comment#224]
(4) BlockNativeColumnarToRowExec
Input [1]: [l_comment#224]
(5) Project [codegen id : 1]
Output [1]: [lower(l_comment#224) AS lower(l_comment)#338]
Input [1]: [l_comment#224]
(6) CollectLimit
Input [1]: [lower(l_comment)#338]
Arguments: 10
rtrim
0: jdbc:hive2://localhost:10000> explain formatted select rtrim(l_comment) from lineitem where l_comment like '%sly%' limit 10 ;
+----------------------------------------------------+
| plan |
+----------------------------------------------------+
| == Physical Plan ==
CollectLimit (6)
+- BlockNativeColumnarToRowExec (5)
+- * ProjectExecTransformer (3)
+- * FilterExecTransformer (2)
+- Scan parquet (1)
(1) Scan parquet
Output [1]: [l_comment#15]
Batched: true
Location: InMemoryFileIndex [file:/data1/liyang/cppproject/gluten/jvm/src/test/resources/tpch-data/lineitem]
PushedFilters: [IsNotNull(l_comment), StringContains(l_comment,sly)]
ReadSchema: struct<l_comment:string>
(2) FilterExecTransformer
Input [1]: [l_comment#15]
Arguments: (isnotnull(l_comment#15) AND Contains(l_comment#15, sly))
(3) ProjectExecTransformer
Input [1]: [l_comment#15]
Arguments: [rtrim(l_comment#15, None) AS rtrim(l_comment)#46]
(4) WholeStageCodegenTransformer (1)
Input [1]: [rtrim(l_comment)#46]
(5) BlockNativeColumnarToRowExec
Input [1]: [rtrim(l_comment)#46]
(6) CollectLimit
Input [1]: [rtrim(l_comment)#46]
Arguments: 10
Thanks for opening a pull request!
Could you open an issue for this pull request on Github Issues?
https://github.com/oap-project/gluten/issues
Then could you also rename commit message and pull request title in the following format?
[Gluten-${ISSUES_ID}] ${detailed message}
See also:
please rebase to main and add some ut.
please rebase to main and add some ut.
Sorry, I'am new to gluten, where do you advice to add uts, thanks!
please refer to gluten-ut module or the test cases in https://github.com/oap-project/gluten/tree/main/backends-clickhouse/src/test/scala/io/glutenproject/execution
please refer to gluten-ut module or the test cases in https://github.com/oap-project/gluten/tree/main/backends-clickhouse/src/test/scala/io/glutenproject/execution
It is done.
this is duplicate with PR #451
I've tested ltrim and rtrim functions for Spark SQL, and it does also work. @rui-mo