HIVE-27370: support 4 bytes characters
What changes were proposed in this pull request?
If a SUBSTR UDF has a 4-byte characters in its parameter, the behavior is different between vectorized and non-vectorized. The vectorized version handles 4-byte characters properly, but the non-vectorized version does not, so similar logic is needed.
And these fixes use vectorized logic:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java#L89-L130
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java#L78-L109
Why are the changes needed?
Vectorized and non-vectorized have different results.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added pattern tests to itest for these to work correctly.
Quality Gate passed
Issues
3 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@difin, please take a look when you have time
@deniskuzZ Sorry, I also overlooked this comments. I fixed it.
Quality Gate passed
Issues
3 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Hi @ryukobayashi, apologies for the slow progress on this PR. However, after noticing some obvious typos, my confidence in the changes has dropped. Overall, the code looks OK, but It needs a thorough review, which I unfortunately don’t have time for at the moment. Since @okumin is your colleague, perhaps he could help move this forward.
Sure. I was not confident that I should review coworker's patch to avoid bias. I will check this anyway.
Looks good to me, but we're running additional in-house integration tests just in case.
Quality Gate passed
Issues
2 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Update: I want to run some additional tests, but I haven't taken the time. I will update the status in 2 weeks
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.