incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[VL] Validate unsupported rewrite string in Re2

Open kecookier opened this issue 1 year ago • 10 comments

What changes were proposed in this pull request?

Now we only validate the regex pattern, but we also need to validate the rewrite string. The regexp_function will call RE2::GlobalReplace(), which will swallow the errors thrown by RE2::Rewrite().

When RE2::CheckRewriteString() fails, Gluten will fallback to vanilla and print a log like:

24/07/02 22:21:38 INFO GlutenFallbackReporter: Validation failed for plan: Project, due to: native check failure:native validation failed for function: regexp_replace due to: Rewrite \[check failed in RE2. Reason: Rewrite schema error: '\' must be followed by a digit or '\'..
24/07/02 22:21:38 INFO GlutenFallbackReporter: appId=local-c0csj_b8MAcjJdWfT2zfiQ, containerId=null, jsonStr={"plan":"Project","reason":"native check failure:native validation failed for function: regexp_replace due to: Rewrite \\[check failed in RE2. Reason: Rewrite schema error: '\\' must be followed by a digit or '\\'."}

(Fixes: #6224)

How was this patch tested?

Exist CI.

kecookier avatar Jul 02 '24 14:07 kecookier

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

github-actions[bot] avatar Jul 02 '24 14:07 github-actions[bot]

@GlutenPerfBot benchmark

zhouyuan avatar Jul 03 '24 09:07 zhouyuan

ACK, will benchmark TPCH/DS on this pull request

GlutenPerfBot avatar Jul 03 '24 09:07 GlutenPerfBot

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6312_time.csv log/native_master_07_02_2024_6b6444e57e_time.csv difference percentage
q1 36.08 34.26 -1.817 94.96%
q2 24.09 22.32 -1.765 92.67%
q3 41.14 40.70 -0.433 98.95%
q4 33.63 33.11 -0.519 98.46%
q5 70.87 69.66 -1.216 98.28%
q6 6.58 7.95 1.373 120.88%
q7 81.31 83.02 1.710 102.10%
q8 82.29 84.00 1.714 102.08%
q9 124.70 122.20 -2.503 97.99%
q10 45.16 47.32 2.166 104.80%
q11 19.96 20.46 0.497 102.49%
q12 24.98 27.16 2.180 108.73%
q13 39.42 39.74 0.320 100.81%
q14 18.45 19.78 1.328 107.20%
q15 32.81 30.60 -2.208 93.27%
q16 14.01 14.01 0.004 100.03%
q17 104.91 102.42 -2.490 97.63%
q18 150.24 151.18 0.939 100.63%
q19 13.71 14.80 1.085 107.91%
q20 26.73 31.05 4.322 116.17%
q21 260.91 264.02 3.107 101.19%
q22 13.32 12.38 -0.939 92.95%
total 1265.30 1272.16 6.857 100.54%

GlutenPerfBot avatar Jul 03 '24 12:07 GlutenPerfBot

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_6312_time.csv log/native_master_07_02_2024_6b6444e57_time.csv difference percentage
q1 14.50 14.94 0.436 103.01%
q2 14.76 14.59 -0.174 98.82%
q3 6.02 3.99 -2.038 66.17%
q4 63.14 64.79 1.650 102.61%
q5 6.37 8.99 2.622 141.18%
q6 3.58 2.43 -1.151 67.83%
q7 4.37 4.25 -0.120 97.26%
q8 4.55 4.94 0.396 108.72%
q9 19.46 18.15 -1.303 93.30%
q10 11.44 10.50 -0.933 91.84%
q11 35.39 37.08 1.684 104.76%
q12 1.46 2.38 0.920 163.00%
q13 6.60 5.72 -0.880 86.68%
q14a 41.18 43.96 2.775 106.74%
q14b 38.72 39.09 0.372 100.96%
q15 3.64 2.68 -0.954 73.76%
q16 40.17 41.98 1.809 104.50%
q17 4.89 5.93 1.037 121.18%
q18 6.39 6.27 -0.112 98.24%
q19 2.23 2.30 0.069 103.11%
q20 1.44 1.33 -0.105 92.67%
q21 1.02 1.10 0.082 108.06%
q22 8.27 8.35 0.075 100.90%
q23a 80.21 84.32 4.115 105.13%
q23b 99.88 103.47 3.591 103.60%
q24a 79.77 78.77 -0.998 98.75%
q24b 80.58 72.76 -7.816 90.30%
q25 4.31 4.39 0.077 101.79%
q26 4.16 2.96 -1.193 71.30%
q27 3.55 3.41 -0.141 96.03%
q28 21.08 21.17 0.085 100.40%
q29 6.66 7.08 0.419 106.29%
q30 9.59 4.09 -5.498 42.67%
q31 6.21 6.30 0.093 101.50%
q32 1.14 1.23 0.093 108.12%
q33 7.35 4.72 -2.628 64.23%
q34 5.89 6.86 0.977 116.59%
q35 7.72 7.65 -0.069 99.11%
q36 3.75 3.67 -0.082 97.81%
q37 4.08 4.66 0.577 114.15%
q38 11.94 14.27 2.329 119.51%
q39a 3.36 3.53 0.163 104.85%
q39b 3.10 2.90 -0.199 93.59%
q40 3.67 3.69 0.021 100.57%
q41 0.62 0.70 0.078 112.56%
q42 0.93 1.08 0.143 115.30%
q43 3.89 4.02 0.130 103.35%
q44 12.28 8.66 -3.617 70.54%
q45 3.35 8.26 4.915 246.79%
q46 3.39 3.48 0.088 102.60%
q47 14.20 14.36 0.157 101.10%
q48 4.26 4.60 0.335 107.86%
q49 9.41 9.33 -0.080 99.15%
q50 19.74 22.19 2.446 112.39%
q51 8.63 11.70 3.071 135.59%
q52 1.00 1.09 0.090 109.05%
q53 2.17 2.02 -0.156 92.81%
q54 3.27 3.32 0.043 101.31%
q55 1.01 1.16 0.144 114.22%
q56 4.38 4.58 0.198 104.51%
q57 8.55 8.80 0.246 102.88%
q58 2.57 2.67 0.106 104.14%
q59 13.72 13.99 0.266 101.94%
q60 4.83 4.89 0.054 101.11%
q61 5.58 5.49 -0.087 98.44%
q62 3.75 5.15 1.397 137.23%
q63 2.11 2.21 0.099 104.70%
q64 51.86 51.58 -0.284 99.45%
q65 13.56 14.11 0.552 104.07%
q66 8.71 4.75 -3.960 54.54%
q67 349.08 349.82 0.736 100.21%
q68 3.61 3.67 0.062 101.71%
q69 6.25 6.44 0.189 103.03%
q70 8.82 8.98 0.163 101.84%
q71 3.43 3.30 -0.131 96.18%
q72 184.79 187.54 2.745 101.49%
q73 2.29 2.34 0.053 102.31%
q74 21.04 21.84 0.800 103.80%
q75 25.76 23.36 -2.406 90.66%
q76 9.56 9.49 -0.063 99.34%
q77 2.46 2.15 -0.314 87.25%
q78 38.55 38.92 0.373 100.97%
q79 3.60 3.62 0.022 100.60%
q80 11.66 11.04 -0.622 94.67%
q81 5.33 5.17 -0.164 96.93%
q82 6.42 6.63 0.208 103.25%
q83 1.60 1.58 -0.016 99.02%
q84 2.80 2.81 0.009 100.32%
q85 7.03 7.01 -0.020 99.71%
q86 3.17 3.38 0.205 106.46%
q87 12.08 12.61 0.533 104.42%
q88 24.48 25.25 0.772 103.15%
q89 5.83 3.21 -2.619 55.06%
q90 3.90 3.86 -0.045 98.84%
q91 2.66 2.54 -0.117 95.59%
q92 1.35 1.32 -0.030 97.81%
q93 28.01 29.02 1.010 103.61%
q94 21.85 22.00 0.150 100.69%
q9 81.17 81.02 -0.150 99.82%
q5 3.96 3.83 -0.130 96.70%
q96 11.97 12.31 0.337 102.82%
q97 1.89 2.06 0.171 109.03%
q98 11.78 11.46 -0.326 97.24%
q99 11.78 11.46 -0.326 97.24%
total 1903.57 1911.40 7.832 100.41%

GlutenPerfBot avatar Jul 03 '24 14:07 GlutenPerfBot

ACK, will benchmark TPCH/DS on this pull request

GlutenPerfBot avatar Jul 04 '24 02:07 GlutenPerfBot

ACK, will benchmark TPCH/DS on this pull request

GlutenPerfBot avatar Jul 04 '24 12:07 GlutenPerfBot

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_6312_time.csv log/native_master_07_02_2024_6b6444e57_time.csv difference percentage
q1 15.73 14.94 -0.787 94.99%
q2 14.91 14.59 -0.324 97.83%
q3 4.14 3.99 -0.153 96.30%
q4 62.86 64.79 1.933 103.08%
q5 7.30 8.99 1.692 123.18%
q6 3.91 2.43 -1.489 61.97%
q7 4.24 4.25 0.010 100.23%
q8 5.66 4.94 -0.714 87.37%
q9 23.23 18.15 -5.072 78.16%
q10 10.99 10.50 -0.489 95.55%
q11 35.10 37.08 1.977 105.63%
q12 1.58 2.38 0.802 150.86%
q13 7.75 5.72 -2.030 73.82%
q14a 41.60 43.96 2.356 105.66%
q14b 39.92 39.09 -0.830 97.92%
q15 2.52 2.68 0.164 106.51%
q16 39.61 41.98 2.371 105.99%
q17 5.60 5.93 0.331 105.91%
q18 7.25 6.27 -0.973 86.57%
q19 3.19 2.30 -0.890 72.09%
q20 1.37 1.33 -0.038 97.21%
q21 2.72 1.10 -1.624 40.40%
q22 8.47 8.35 -0.123 98.55%
q23a 82.25 84.32 2.070 102.52%
q23b 102.34 103.47 1.133 101.11%
q24a 73.68 78.77 5.089 106.91%
q24b 69.35 72.76 3.411 104.92%
q25 9.56 4.39 -5.175 45.87%
q26 2.97 2.96 -0.007 99.77%
q27 2.90 3.41 0.512 117.63%
q28 23.22 21.17 -2.051 91.17%
q29 6.57 7.08 0.512 107.79%
q30 4.06 4.09 0.033 100.80%
q31 6.12 6.30 0.175 102.87%
q32 1.15 1.23 0.083 107.18%
q33 4.85 4.72 -0.132 97.29%
q34 3.62 6.86 3.245 189.73%
q35 6.33 7.65 1.322 120.87%
q36 3.29 3.67 0.378 111.46%
q37 3.61 4.66 1.044 128.87%
q38 11.38 14.27 2.889 125.40%
q39a 3.34 3.53 0.189 105.66%
q39b 4.70 2.90 -1.798 61.72%
q40 3.84 3.69 -0.153 96.01%
q41 1.93 0.70 -1.229 36.24%
q42 0.98 1.08 0.099 110.09%
q43 3.91 4.02 0.112 102.86%
q44 8.89 8.66 -0.226 97.45%
q45 3.44 8.26 4.818 239.87%
q46 3.20 3.48 0.279 108.70%
q47 14.26 14.36 0.102 100.71%
q48 4.51 4.60 0.086 101.90%
q49 9.63 9.33 -0.301 96.88%
q50 23.64 22.19 -1.452 93.86%
q51 8.61 11.70 3.090 135.88%
q52 1.03 1.09 0.060 105.83%
q53 2.16 2.02 -0.142 93.42%
q54 3.49 3.32 -0.174 95.02%
q55 1.02 1.16 0.137 113.47%
q56 4.45 4.58 0.130 102.92%
q57 8.56 8.80 0.246 102.87%
q58 2.52 2.67 0.156 106.19%
q59 17.67 13.99 -3.684 79.16%
q60 5.11 4.89 -0.219 95.71%
q61 5.60 5.49 -0.109 98.06%
q62 4.55 5.15 0.594 113.05%
q63 2.22 2.21 -0.005 99.77%
q64 48.06 51.58 3.523 107.33%
q65 13.68 14.11 0.428 103.13%
q66 3.46 4.75 1.290 137.29%
q67 351.80 349.82 -1.981 99.44%
q68 3.74 3.67 -0.071 98.09%
q69 6.23 6.44 0.210 103.38%
q70 13.46 8.98 -4.474 66.75%
q71 4.70 3.30 -1.403 70.13%
q72 185.21 187.54 2.332 101.26%
q73 4.00 2.34 -1.654 58.63%
q74 21.32 21.84 0.521 102.44%
q75 23.46 23.36 -0.099 99.58%
q76 9.17 9.49 0.323 103.53%
q77 2.19 2.15 -0.045 97.93%
q78 42.29 38.92 -3.374 92.02%
q79 3.66 3.62 -0.036 99.02%
q80 11.46 11.04 -0.419 96.35%
q81 5.10 5.17 0.070 101.37%
q82 9.05 6.63 -2.414 73.31%
q83 1.51 1.58 0.080 105.28%
q84 2.76 2.81 0.051 101.83%
q85 6.86 7.01 0.149 102.17%
q86 4.22 3.38 -0.843 80.01%
q87 14.24 12.61 -1.631 88.55%
q88 24.16 25.25 1.097 104.54%
q89 3.21 3.21 0.002 100.07%
q90 8.54 3.86 -4.682 45.16%
q91 2.54 2.54 -0.001 99.97%
q92 1.38 1.32 -0.057 95.85%
q93 27.92 29.02 1.101 103.94%
q94 21.59 22.00 0.406 101.88%
q9 84.36 81.02 -3.340 96.04%
q5 3.85 3.83 -0.019 99.49%
q96 12.03 12.31 0.285 102.37%
q97 1.91 2.06 0.152 107.97%
q98 9.52 11.46 1.933 120.29%
q99 9.52 11.46 1.933 120.29%
total 1912.76 1911.40 -1.357 99.93%

GlutenPerfBot avatar Jul 04 '24 14:07 GlutenPerfBot

ACK, will benchmark TPCH/DS on this pull request

GlutenPerfBot avatar Jul 05 '24 00:07 GlutenPerfBot

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6312_time.csv log/native_master_07_04_2024_ff0b4733a_time.csv difference percentage
q1 34.70 37.25 2.549 107.35%
q2 29.06 23.94 -5.122 82.38%
q3 38.82 40.88 2.065 105.32%
q4 35.88 32.31 -3.565 90.06%
q5 70.95 69.58 -1.370 98.07%
q6 7.78 8.08 0.299 103.85%
q7 86.65 84.15 -2.499 97.12%
q8 85.17 86.08 0.909 101.07%
q9 121.40 122.79 1.387 101.14%
q10 45.18 46.05 0.870 101.93%
q11 21.86 20.55 -1.314 93.99%
q12 25.89 27.84 1.954 107.55%
q13 39.27 39.73 0.461 101.17%
q14 20.12 18.88 -1.235 93.86%
q15 32.95 33.90 0.946 102.87%
q16 14.16 13.35 -0.809 94.29%
q17 104.77 105.50 0.733 100.70%
q18 147.18 149.41 2.230 101.52%
q19 14.61 13.77 -0.836 94.27%
q20 27.00 30.34 3.336 112.36%
q21 264.36 263.79 -0.566 99.79%
q22 14.24 12.38 -1.863 86.92%
total 1281.99 1280.56 -1.438 99.89%

GlutenPerfBot avatar Jul 05 '24 01:07 GlutenPerfBot

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Aug 21 '24 01:08 github-actions[bot]

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

github-actions[bot] avatar Aug 31 '24 01:08 github-actions[bot]