openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

AArch64: Inline StringUTF16.compress([CI[BII)I

Open knn-k opened this issue 1 year ago • 12 comments

This commit recognizes StringUTF16.compress([CI[BII)I, and generates inlined code for AArch64.

knn-k avatar Sep 24 '24 06:09 knn-k

Jenkins test sanity alinux64,amac jdk17

knn-k avatar Sep 24 '24 06:09 knn-k

Jenkins test sanity alinux64,amac jdk11

knn-k avatar Sep 24 '24 09:09 knn-k

Jenkins test sanity alinux64,amac jdk17

knn-k avatar Sep 24 '24 10:09 knn-k

Jenkins test sanity amac jdk11

knn-k avatar Sep 24 '24 11:09 knn-k

This method is used only in Java 17+. Should I add #if JAVA_SPEC_VERSION >= 17 to the code?

knn-k avatar Sep 25 '24 01:09 knn-k

The Java method copies elements of a char[] array to a byte[] array. If any one of the array elements is larger than 255, the method fails and returns 0. https://github.com/ibmruntimes/openj9-openjdk-jdk17/blob/764b7257fe90012db56000a556f2b835542a6063/src/java.base/share/classes/java/lang/StringUTF16.java#L176-L190

The loop in the vectorized code loads 16 elements from the char[] array at once, using two SIMD registers. It then collects the upper bytes of the char elements using the uzp2 instruction, and checks the result to see if there is any element larger than 255 or not. If no element exceeds 255, the code collects the lower bytes using the uzp1 instruction, and stores the result to the byte[] array. When there are 15 elements or less remaining, the code copies the array by 8-, 4-, 2-, and 1-elements using a single SIMD register. The instructions for collecting the upper and lower bytes are trn2 and xtn, respectively, instead of uzp2 and uzp1.

knn-k avatar Sep 25 '24 13:09 knn-k

For Z we can borrow some logic / refactor code from [1] with some changes to handle return value, which was used to accelerate implEncode* intrinsics [2].

@knn-k Let me know if you want to take a crack at this / need some help from Z team.

[1]. https://github.com/eclipse/omr/blob/9b3e65e264a66efca05c8ab283a81686ef03c876/compiler/z/codegen/OMRTreeEvaluator.cpp#L15390 [2]. https://github.com/ibmruntimes/openj9-openjdk-jdk17/blob/764b7257fe90012db56000a556f2b835542a6063/src/java.base/share/classes/java/lang/StringCoding.java#L47-L58

r30shah avatar Sep 25 '24 21:09 r30shah

Jenkins test sanity alinux64,amac jdk11,jdk17

knn-k avatar Oct 04 '24 11:10 knn-k

I updated the code:

  • Improved the performance by changing the checks of the failure condition
  • Added comments
  • Added #if JAVA_SPEC_VERSION >= 11 - #endif: StringUTF16 class is available in Java 11+, but it is not in Java 8.

knn-k avatar Oct 04 '24 11:10 knn-k

Jenkins compile alinux64 jdk8

knn-k avatar Oct 04 '24 11:10 knn-k

Because StringUTF16.compress() is a public static method, it can be called from anywhere with any inputs. The Java implementation will throw NPEs and AIOBs if insufficiently sized source/dest arrays are passed in. Unless VP is able to prove otherwise, I think you will need to null and bound check the arrays before entering your intrinsified code. What we've done in other situations like this is to do those checks in the mainline code, and if they fail then branch to an OOL section that calls the original, unintrinsified Java method where the appropriate exception would be thrown.

My other comment is around testing and how you ensured the different sections (i.e., source array lengths) of your intrinsic were exercised. It's worth creating a unit test if one doesn't already exist.

Otherwise, the actual logic itself looks fine.

0xdaryl avatar Oct 08 '24 20:10 0xdaryl

I wrote a program and used it for making sure that the StringUTF16.compress() works as expected for various offsets and lengths. I need to rewrite its code for running it as a unit test.

knn-k avatar Oct 09 '24 07:10 knn-k

This commit is superseded by #20406 and https://github.com/eclipse/omr/pull/7499 that implement arraytranslateTRTO255.

knn-k avatar Oct 24 '24 07:10 knn-k