HIVE-27370: support 4 bytes characters
What changes were proposed in this pull request?
If a SUBSTR UDF has a 4-byte characters in its parameter, the behavior is different between vectorized and non-vectorized. The vectorized version handles 4-byte characters properly, but the non-vectorized version does not, so similar logic is needed.
And these fixes use vectorized logic:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java#L89-L130
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java#L78-L109
Why are the changes needed?
Vectorized and non-vectorized have different results.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added pattern tests to itest for these to work correctly.
Quality Gate passed
The SonarCloud Quality Gate passed, but some issues were introduced.
2 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
@ryukobayashi, could you please rebase
@deniskuzZ I updated it to the latest.
Quality Gate passed
Issues
10 New issues
0 Accepted issues
Measures
0 Security Hotspots
No data about Coverage
No data about Duplication
Hi @deniskuzZ I would if you could check about this when you have time.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
sorry, I'll try to review ASAP, but i have many other PRs on me pending review @SourabhBadhya, could you please check this