incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[VL] Remove the registry for Velox's prestosql scalar functions

Open PHILO-HE opened this issue 10 months ago • 1 comments

What changes were proposed in this pull request?

Currently, Gluten registers Velox's prestosql scalar functions. Thus, some of them can be used for Spark SQL if there is no semantic difference. However, there are some conflict issues, which makes unexpected function calling. See below links. We plan to change Velox code to register presto scalar functions (see below list) that used by GLuten in spark's registry.

https://github.com/apache/incubator-gluten/discussions/4759 https://github.com/apache/incubator-gluten/pull/5150

Function list: url_decode/url_encode

How was this patch tested?

Existing tests.

PHILO-HE avatar Mar 29 '24 07:03 PHILO-HE

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

github-actions[bot] avatar Mar 29 '24 07:03 github-actions[bot]

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5202_time.csv log/native_master_04_09_2024_8e5f63c0c_time.csv difference percentage
q1 37.13 36.38 -0.750 97.98%
q2 23.82 23.87 0.045 100.19%
q3 37.27 36.63 -0.634 98.30%
q4 37.54 38.51 0.963 102.57%
q5 70.57 71.53 0.968 101.37%
q6 9.69 5.75 -3.939 59.35%
q7 81.58 85.53 3.950 104.84%
q8 86.21 85.59 -0.627 99.27%
q9 125.65 119.02 -6.626 94.73%
q10 47.97 43.75 -4.219 91.21%
q11 19.86 20.38 0.524 102.64%
q12 25.90 26.91 1.003 103.87%
q13 47.71 48.54 0.833 101.75%
q14 24.01 22.64 -1.373 94.28%
q15 31.92 31.84 -0.082 99.74%
q16 12.49 13.80 1.313 110.52%
q17 103.62 101.51 -2.105 97.97%
q18 142.55 143.19 0.640 100.45%
q19 15.68 13.58 -2.107 86.57%
q20 27.11 28.55 1.437 105.30%
q21 227.72 226.97 -0.753 99.67%
q22 13.85 13.92 0.069 100.50%
total 1249.85 1238.38 -11.469 99.08%

GlutenPerfBot avatar Apr 10 '24 08:04 GlutenPerfBot

Hi, could you share the current status?

Yohahaha avatar Apr 11 '24 09:04 Yohahaha

Hi, could you share the current status?

@Yohahaha, this pr depends on a velox pr: https://github.com/facebookincubator/velox/pull/9425. I will try to push the review progress.

PHILO-HE avatar Apr 11 '24 09:04 PHILO-HE

Hi, could you share the current status?

@Yohahaha, this pr depends on a velox pr: facebookincubator/velox#9425. I will try to push the review progress.

why not choose below usage? I think velox has already support register with prefix, the only thing we need to do is choose which prefixed-function is gluten needed. correct me if it's wrong, thank you!

  velox::functions::prestosql::registerAllScalarFunctions("presto_");
  velox::functions::sparksql::registerFunctions("spark_");

  prestoMap("array_distinct" -> "presto");
  // do naming conversion in cpp module

Yohahaha avatar Apr 11 '24 09:04 Yohahaha

Hi, could you share the current status?

@Yohahaha, this pr depends on a velox pr: facebookincubator/velox#9425. I will try to push the review progress.

why not choose below usage? I think velox has already support register with prefix, the only thing we need to do is choose which prefixed-function is gluten needed. correct me if it's wrong, thank you!

  velox::functions::prestosql::registerAllScalarFunctions("presto_");
  velox::functions::sparksql::registerFunctions("spark_");

  prestoMap("array_distinct" -> "presto");
  // do naming conversion in cpp module

They are discussed internally. We feel it is more clear by using only one registry and we want to avoid maintaining such a map in Gluten.

PHILO-HE avatar Apr 11 '24 10:04 PHILO-HE