hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-27370: support 4 bytes characters

Open ryukobayashi opened this issue 1 year ago • 4 comments

What changes were proposed in this pull request?

If a SUBSTR UDF has a 4-byte characters in its parameter, the behavior is different between vectorized and non-vectorized. The vectorized version handles 4-byte characters properly, but the non-vectorized version does not, so similar logic is needed. And these fixes use vectorized logic: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java#L89-L130 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java#L78-L109

Why are the changes needed?

Vectorized and non-vectorized have different results.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added pattern tests to itest for these to work correctly.

ryukobayashi avatar Jan 28 '25 05:01 ryukobayashi

@difin, please take a look when you have time

deniskuzZ avatar Feb 01 '25 18:02 deniskuzZ

@deniskuzZ Sorry, I also overlooked this comments. I fixed it.

ryukobayashi avatar Apr 10 '25 10:04 ryukobayashi

Hi @ryukobayashi, apologies for the slow progress on this PR. However, after noticing some obvious typos, my confidence in the changes has dropped. Overall, the code looks OK, but It needs a thorough review, which I unfortunately don’t have time for at the moment. Since @okumin is your colleague, perhaps he could help move this forward.

deniskuzZ avatar Jun 26 '25 09:06 deniskuzZ

Sure. I was not confident that I should review coworker's patch to avoid bias. I will check this anyway.

okumin avatar Jun 27 '25 11:06 okumin

Looks good to me, but we're running additional in-house integration tests just in case.

okumin avatar Jul 04 '25 09:07 okumin

Update: I want to run some additional tests, but I haven't taken the time. I will update the status in 2 weeks

okumin avatar Jul 18 '25 12:07 okumin

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Sep 17 '25 00:09 github-actions[bot]