incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[CH-158] Support functions for clickhouse backend: lower/upper/ltrim/rtrim

Open taiyang-li opened this issue 2 years ago • 5 comments

What changes were proposed in this pull request?

Close https://github.com/Kyligence/ClickHouse/issues/158 Related PR: https://github.com/Kyligence/ClickHouse/pull/117

Support more functions

  • lower
  • upper
  • ltrim(when trimStr is None)
  • rtrim(when trimStr is None)

How was this patch tested?

upper:

0: jdbc:hive2://localhost:10000>  explain formatted select upper(l_comment) from lineitem where l_comment like '%sly%' limit 10;
+----------------------------------------------------+
|                        plan                        |
+----------------------------------------------------+
| == Physical Plan ==
CollectLimit (6)
+- * Project (5)
   +- BlockNativeColumnarToRowExec (4)
      +- * FilterExecTransformer (2)
         +- Scan parquet  (1)


(1) Scan parquet 
Output [1]: [l_comment#15]
Batched: true
Location: InMemoryFileIndex [file:/data1/liyang/cppproject/gluten/jvm/src/test/resources/tpch-data/lineitem]
PushedFilters: [IsNotNull(l_comment), StringContains(l_comment,sly)]
ReadSchema: struct<l_comment:string>

(2) FilterExecTransformer
Input [1]: [l_comment#15]
Arguments: (isnotnull(l_comment#15) AND Contains(l_comment#15, sly))

(3) WholeStageCodegenTransformer (1)
Input [1]: [l_comment#15]

(4) BlockNativeColumnarToRowExec
Input [1]: [l_comment#15]

(5) Project [codegen id : 1]
Output [1]: [upper(l_comment#15) AS upper(l_comment)#46]
Input [1]: [l_comment#15]

(6) CollectLimit
Input [1]: [upper(l_comment)#46]
Arguments: 10

lower

0: jdbc:hive2://localhost:10000> explain formatted select lower(l_comment) from lineitem where l_comment like '%sly%' limit 10
. . . . . . . . . . . . . . . .> ; 
+----------------------------------------------------+
|                        plan                        |
+----------------------------------------------------+
| == Physical Plan ==
CollectLimit (6)
+- BlockNativeColumnarToRowExec (5)
   +- * ProjectExecTransformer (3)
      +- * FilterExecTransformer (2)
         +- Scan parquet  (1)


(1) Scan parquet 
Output [1]: [l_comment#15]
Batched: true
Location: InMemoryFileIndex [file:/data1/liyang/cppproject/gluten/jvm/src/test/resources/tpch-data/lineitem]
PushedFilters: [IsNotNull(l_comment), StringContains(l_comment,sly)]
ReadSchema: struct<l_comment:string>

(2) FilterExecTransformer
Input [1]: [l_comment#15]
Arguments: (isnotnull(l_comment#15) AND Contains(l_comment#15, sly))

(3) ProjectExecTransformer
Input [1]: [l_comment#15]
Arguments: [lower(l_comment#15) AS lower(l_comment)#38]

(4) WholeStageCodegenTransformer (1)
Input [1]: [lower(l_comment)#38]

(5) BlockNativeColumnarToRowExec
Input [1]: [lower(l_comment)#38]

(6) CollectLimit
Input [1]: [lower(l_comment)#38]
Arguments: 10

ltrim:

0: jdbc:hive2://localhost:10000> explain formatted select ltrim(l_comment) from lineitem where l_comment like '%sly%' 
. . . . . . . . . . . . . . . .> ;
+----------------------------------------------------+
|                        plan                        |
+----------------------------------------------------+
| == Physical Plan ==
CollectLimit (6)
+- * Project (5)
   +- BlockNativeColumnarToRowExec (4)
      +- * FilterExecTransformer (2)
         +- Scan parquet  (1)


(1) Scan parquet 
Output [1]: [l_comment#224]
Batched: true
Location: InMemoryFileIndex [file:/data1/liyang/cppproject/gluten/jvm/src/test/resources/tpch-data/lineitem]
PushedFilters: [IsNotNull(l_comment), StringContains(l_comment,sly)]
ReadSchema: struct<l_comment:string>

(2) FilterExecTransformer
Input [1]: [l_comment#224]
Arguments: (isnotnull(l_comment#224) AND Contains(l_comment#224, sly))

(3) WholeStageCodegenTransformer (1)
Input [1]: [l_comment#224]

(4) BlockNativeColumnarToRowExec
Input [1]: [l_comment#224]

(5) Project [codegen id : 1]
Output [1]: [lower(l_comment#224) AS lower(l_comment)#338]
Input [1]: [l_comment#224]

(6) CollectLimit
Input [1]: [lower(l_comment)#338]
Arguments: 10

rtrim

0: jdbc:hive2://localhost:10000> explain formatted select rtrim(l_comment) from lineitem where l_comment like '%sly%' limit 10  ;  
+----------------------------------------------------+
|                        plan                        |
+----------------------------------------------------+
| == Physical Plan ==
CollectLimit (6)
+- BlockNativeColumnarToRowExec (5)
   +- * ProjectExecTransformer (3)
      +- * FilterExecTransformer (2)
         +- Scan parquet  (1)


(1) Scan parquet 
Output [1]: [l_comment#15]
Batched: true
Location: InMemoryFileIndex [file:/data1/liyang/cppproject/gluten/jvm/src/test/resources/tpch-data/lineitem]
PushedFilters: [IsNotNull(l_comment), StringContains(l_comment,sly)]
ReadSchema: struct<l_comment:string>

(2) FilterExecTransformer
Input [1]: [l_comment#15]
Arguments: (isnotnull(l_comment#15) AND Contains(l_comment#15, sly))

(3) ProjectExecTransformer
Input [1]: [l_comment#15]
Arguments: [rtrim(l_comment#15, None) AS rtrim(l_comment)#46]

(4) WholeStageCodegenTransformer (1)
Input [1]: [rtrim(l_comment)#46]

(5) BlockNativeColumnarToRowExec
Input [1]: [rtrim(l_comment)#46]

(6) CollectLimit
Input [1]: [rtrim(l_comment)#46]
Arguments: 10

taiyang-li avatar Sep 14 '22 06:09 taiyang-li

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[Gluten-${ISSUES_ID}] ${detailed message}

See also:

github-actions[bot] avatar Sep 14 '22 06:09 github-actions[bot]

please rebase to main and add some ut.

zzcclp avatar Sep 22 '22 01:09 zzcclp

please rebase to main and add some ut.

Sorry, I'am new to gluten, where do you advice to add uts, thanks!

taiyang-li avatar Sep 22 '22 03:09 taiyang-li

please refer to gluten-ut module or the test cases in https://github.com/oap-project/gluten/tree/main/backends-clickhouse/src/test/scala/io/glutenproject/execution

zzcclp avatar Sep 22 '22 08:09 zzcclp

please refer to gluten-ut module or the test cases in https://github.com/oap-project/gluten/tree/main/backends-clickhouse/src/test/scala/io/glutenproject/execution

It is done.

taiyang-li avatar Oct 14 '22 07:10 taiyang-li

this is duplicate with PR #451

zzcclp avatar Oct 19 '22 09:10 zzcclp

I've tested ltrim and rtrim functions for Spark SQL, and it does also work. @rui-mo

lviiii avatar Oct 20 '22 05:10 lviiii