incubator-gluten
incubator-gluten copied to clipboard
[VL] Remove the registry for Velox's prestosql scalar functions
What changes were proposed in this pull request?
Currently, Gluten registers Velox's prestosql scalar functions. Thus, some of them can be used for Spark SQL if there is no semantic difference. However, there are some conflict issues, which makes unexpected function calling. See below links. We plan to change Velox code to register presto scalar functions (see below list) that used by GLuten in spark's registry.
https://github.com/apache/incubator-gluten/discussions/4759 https://github.com/apache/incubator-gluten/pull/5150
Function list: url_decode/url_encode
How was this patch tested?
Existing tests.
Thanks for opening a pull request!
Could you open an issue for this pull request on Github Issues?
https://github.com/apache/incubator-gluten/issues
Then could you also rename commit message and pull request title in the following format?
[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}
See also:
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
query | log/native_5202_time.csv | log/native_master_04_09_2024_8e5f63c0c_time.csv | difference | percentage |
q1 | 37.13 | 36.38 | -0.750 | 97.98% |
q2 | 23.82 | 23.87 | 0.045 | 100.19% |
q3 | 37.27 | 36.63 | -0.634 | 98.30% |
q4 | 37.54 | 38.51 | 0.963 | 102.57% |
q5 | 70.57 | 71.53 | 0.968 | 101.37% |
q6 | 9.69 | 5.75 | -3.939 | 59.35% |
q7 | 81.58 | 85.53 | 3.950 | 104.84% |
q8 | 86.21 | 85.59 | -0.627 | 99.27% |
q9 | 125.65 | 119.02 | -6.626 | 94.73% |
q10 | 47.97 | 43.75 | -4.219 | 91.21% |
q11 | 19.86 | 20.38 | 0.524 | 102.64% |
q12 | 25.90 | 26.91 | 1.003 | 103.87% |
q13 | 47.71 | 48.54 | 0.833 | 101.75% |
q14 | 24.01 | 22.64 | -1.373 | 94.28% |
q15 | 31.92 | 31.84 | -0.082 | 99.74% |
q16 | 12.49 | 13.80 | 1.313 | 110.52% |
q17 | 103.62 | 101.51 | -2.105 | 97.97% |
q18 | 142.55 | 143.19 | 0.640 | 100.45% |
q19 | 15.68 | 13.58 | -2.107 | 86.57% |
q20 | 27.11 | 28.55 | 1.437 | 105.30% |
q21 | 227.72 | 226.97 | -0.753 | 99.67% |
q22 | 13.85 | 13.92 | 0.069 | 100.50% |
total | 1249.85 | 1238.38 | -11.469 | 99.08% |
Hi, could you share the current status?
Hi, could you share the current status?
@Yohahaha, this pr depends on a velox pr: https://github.com/facebookincubator/velox/pull/9425. I will try to push the review progress.
Hi, could you share the current status?
@Yohahaha, this pr depends on a velox pr: facebookincubator/velox#9425. I will try to push the review progress.
why not choose below usage? I think velox has already support register with prefix, the only thing we need to do is choose which prefixed-function is gluten needed. correct me if it's wrong, thank you!
velox::functions::prestosql::registerAllScalarFunctions("presto_");
velox::functions::sparksql::registerFunctions("spark_");
prestoMap("array_distinct" -> "presto");
// do naming conversion in cpp module
Hi, could you share the current status?
@Yohahaha, this pr depends on a velox pr: facebookincubator/velox#9425. I will try to push the review progress.
why not choose below usage? I think velox has already support register with prefix, the only thing we need to do is choose which prefixed-function is gluten needed. correct me if it's wrong, thank you!
velox::functions::prestosql::registerAllScalarFunctions("presto_"); velox::functions::sparksql::registerFunctions("spark_"); prestoMap("array_distinct" -> "presto"); // do naming conversion in cpp module
They are discussed internally. We feel it is more clear by using only one registry and we want to avoid maintaining such a map in Gluten.