openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

Vector lane reversal error on Big endian architectures

Open andreaTP opened this issue 1 month ago • 14 comments

Enabling testing on IBM Z chicory exposed an issue in the Vector API on Big endian architectures, it can be reproduced using this repo:

https://github.com/andreaTP/s390x-vector-bug-reproducer

Originally triggered by: https://github.com/IBM/actionspz/issues/5

I think this is a better place than IBM/actionspz to track progress on the issue. cc. @@offamitkumar

andreaTP avatar Oct 24 '25 15:10 andreaTP

FYI @gita-omr @r30shah @ehsankianifar

BradleyWood avatar Oct 24 '25 19:10 BradleyWood

I see byte order for registers is hardcoded to LITTLE_ENDIAN in jcl: https://github.com/ibmruntimes/openj9-openjdk-jdk21/blob/79bd3ea82fc7756b5340b8cb8f54b558260a9e63/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L51C1-L64C1

    /**
     * The order of vector bytes as stored in the register
     * file.  This becomes visible with the asRaw[Type]Vector
     * operations, which convert between the internal byte-wise
     * representation and the typed lane-wise representation.
     * It is very possible for a platform to have big-endian
     * memory layout and little-endian register layout,
     * so this is a different setting from NATIVE_ENDIAN.
     * In fact, both Intel and ARM use LE conventions here.
     * Future work may be needed for resolutely BE platforms.
     */
    /*package-private*/
    static final ByteOrder REGISTER_ENDIAN = ByteOrder.LITTLE_ENDIAN;

Not sure if it is related to the behaviour that was observed here or not. I'll take a closer look.

ehsankianifar avatar Oct 27 '25 15:10 ehsankianifar

@ehsankianifar - Are you able to reproduce the failure using the test in https://github.com/andreaTP/s390x-vector-bug-reproducer?

r30shah avatar Oct 27 '25 16:10 r30shah

@ehsankianifar - Are you able to reproduce the failure using the test in https://github.com/andreaTP/s390x-vector-bug-reproducer?

Yes the test outcome is [8,6,12,10] in my testing with all flavours of java on S390x

ehsankianifar avatar Oct 27 '25 16:10 ehsankianifar

In reinterpret logic we simply reinterpret the vector type to the destination type: https://github.com/ibmruntimes/openj9-openjdk-jdk21/blob/79bd3ea82fc7756b5340b8cb8f54b558260a9e63/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java#L223-L230 When the source is a long value like 0x0000000100000002, on big endian platforms the destination vector would be {0x00000001, 0x00000002} but on Little endian systems it will be {0x00000002, 0x00000003} (lanes swapped). ~Isn't it the expected behaviour?~

ehsankianifar avatar Oct 27 '25 16:10 ehsankianifar

I am sure this fails with JIT off as well, but can you confirm. Also based on your hunch in https://github.com/eclipse-openj9/openj9/issues/22821#issuecomment-3452058746 - Is it possible to change ByteOrder to Big Endian and see if test passes or not.

I would recommend you to share the detailed stack being executed when LongVector being reinterpreted as Int.

When the source is a long value like 0x0000000100000002, on big endian platforms the destination vector would be {0x00000001, 0x00000002} but on Little endian systems it will be {0x00000002, 0x00000003} (lanes swapped). Isn't it the expected behaviour?

What do you mean? Here is what the spec says about that method - https://docs.oracle.com/en/java/javase/21/docs/api/jdk.incubator.vector/jdk/incubator/vector/LongVector.html#reinterpretAsInts()

r30shah avatar Oct 27 '25 16:10 r30shah

I tested changing the register endianness and it doesn't change the test result. The result is the same in both jitted and interpreted (-Xint) runs.

ehsankianifar avatar Oct 27 '25 16:10 ehsankianifar

The Java AbstractVector class includes a property named REGISTER_ENDIAN, which defaults to LITTLE_ENDIAN, as previously noted in https://github.com/eclipse-openj9/openj9/issues/22821#issuecomment-3452058746. All conversion operations within the class assume a little-endian register configuration, and there are assertions in place that fail if this property is modified. For example, the reinterpretAsInts() method eventually invokes asVectorRawTemplate(), which explicitly states that it adheres to the register endian order which is little-endian.

    final AbstractVector<?> asVectorRawTemplate(LaneType laneType) {
        // NOTE:  This assumes that convert0('X')
        // respects REGISTER_ENDIAN order.
        return convert0('X', vspecies().withLanes(laneType));
    }

The convert0() method, called by asVectorRawTemplate(), performs a type reinterpretation under the assumption of little-endian ordering. It contains an assertion that fails when the register endian configuration is changed, and even if the assertion is removed, there is no logic implemented to handle big-endian scenarios.

    AbstractVector<F> convert0(char kind, AbstractSpecies<F> rsp) {
        // Derive some JIT-time constants:
        Class<?> vtype;
        Class<?> etype;   // fill in after switch (constant)
        int vlength;      // fill in after switch (mark type profile?)
        Class<?> rvtype;  // fill in after switch (mark type profile)
        Class<?> rtype;
        int rlength;
        switch (kind) {
        case 'Z':  // lane-wise size change, maybe with sign clip
            AbstractSpecies<?> rspi = rsp.asIntegral();
            AbstractSpecies<?> vsp = this.vspecies();
            AbstractSpecies<?> vspi = vsp.asIntegral();
            AbstractVector<?> biti = vspi == vsp ? this : this.convert0('X', vspi);
            rtype = rspi.elementType();
            rlength = rspi.laneCount();
            etype = vspi.elementType();
            vlength = vspi.laneCount();
            rvtype = rspi.dummyVector().getClass();
            vtype = vspi.dummyVector().getClass();
            int opc = vspi.elementSize() < rspi.elementSize() ? VectorSupport.VECTOR_OP_UCAST : VectorSupport.VECTOR_OP_CAST;
            AbstractVector<?> bitv = VectorSupport.convert(opc,
                    vtype, etype, vlength,
                    rvtype, rtype, rlength,
                    biti, rspi,
                    AbstractVector::defaultUCast);
            return (rspi == rsp ? bitv.check0(rsp) : bitv.convert0('X', rsp));
        case 'C':  // lane-wise cast (but not identity)
            rtype = rsp.elementType();
            rlength = rsp.laneCount();
            etype = this.elementType(); // (profile)
            vlength = this.length();  // (profile)
            rvtype = rsp.dummyVector().getClass();  // (profile)
            vtype = this.getClass();
            return VectorSupport.convert(VectorSupport.VECTOR_OP_CAST,
                    vtype, etype, vlength,
                    rvtype, rtype, rlength,
                    this, rsp,
                    AbstractVector::defaultCast);
        case 'X':  // reinterpret cast, not lane-wise if lane sizes differ
            rtype = rsp.elementType();
            rlength = rsp.laneCount();
            etype = this.elementType(); // (profile)
            vlength = this.length();  // (profile)
            rvtype = rsp.dummyVector().getClass();  // (profile)
            vtype = this.getClass();
            return VectorSupport.convert(VectorSupport.VECTOR_OP_REINTERPRET,
                    vtype, etype, vlength,
                    rvtype, rtype, rlength,
                    this, rsp,
                    AbstractVector::defaultReinterpret);
        }
        throw new AssertionError();
    }

    static {
        // Recode uses of VectorSupport.reinterpret if this assertion fails:
        assert(REGISTER_ENDIAN == ByteOrder.LITTLE_ENDIAN);
    }

Maintaining register values in little-endian format is not feasible on big-endian systems, as arithmetic operations would not function correctly. While it is technically possible to introduce lane-swapping logic during type conversions on big-endian platforms, such an approach would be inefficient. Notably, there is a comment in AbstractVector.java stating: "Future work may be needed for resolutely BE platforms." This suggests that the reported issue may be contingent on that future development. In conclusion, this appears not to be a bug in the runtime itself. Rather, the generated bytecode does not currently support big-endian configurations.

ehsankianifar avatar Oct 28 '25 15:10 ehsankianifar

Issue Summary

The AbstractVector class in the Java Vector API defines a static property REGISTER_ENDIAN which is hardcoded to ByteOrder.LITTLE_ENDIAN. This assumption is used in various reinterpretation methods (e.g., reinterpretAsInts, reinterpretAsLongs, etc.) to determine how vector lanes are reinterpreted across different element types. However, this design introduces incorrect behavior on big-endian platforms. Specifically, when reinterpretation operations are performed, the resulting vector lanes are swapped or misaligned, because the logic assumes little-endian ordering regardless of the actual platform endianness.

Impact

On big-endian systems (e.g., IBM z/Architecture), reinterpretation operations yield incorrect results. This affects correctness in applications relying on precise lane-level reinterpretation. It also introduces portability issues for Java applications using the Vector API across different architectures.

Technical Details

The field REGISTER_ENDIAN is defined as:

static final ByteOrder REGISTER_ENDIAN = ByteOrder.LITTLE_ENDIAN;

This value is used in methods like:

public IntVector reinterpretAsInts() { ... }
public LongVector reinterpretAsLongs() { ... }

These methods assume that the underlying byte layout of the vector matches little-endian ordering, which is not valid on big-endian systems. As a result, the reinterpretation logic does not reflect the actual memory layout, leading to incorrect lane values.

How to Reproduce

This issue was originally reported by @andreaTP , who created https://github.com/andreaTP/s390x-vector-bug-reproducer demonstrating the problem. I acknowledge and appreciate his contribution. Here is a simplified reproduction code:


// compile: javac  --add-modules jdk.incubator.vector Reproducer.java
// run:     java   --add-modules jdk.incubator.vector Reproducer

import jdk.incubator.vector.*;

public class Reproducer {
    public static void main(String[] args) {
        LongVector v1 = LongVector.fromArray(LongVector.SPECIES_128,
            new long[]{(2L << 32) | 1L, (4L << 32) | 3L}, 0);
        // v1 = {0x0000000200000001, 0x0000000400000003}

        IntVector result = v1.reinterpretAsInts();
        // Expected result: [1, 2, 3, 4]
        // Actual result on big-endian platforms: [2, 1, 4, 3]

        System.out.println(result);
    }
}

Important Clarification

This issue is not specific to any particular Java vendor. It stems from a design decision in the shared Vector API implementation under the jdk.incubator.vector module, which is part of the OpenJDK project. Any Java runtime that includes this module and runs on a big-endian platform is likely to exhibit the same incorrect behavior due to the hardcoded little-endian assumption in the AbstractVector class. @r30shah FYI

ehsankianifar avatar Oct 30 '25 16:10 ehsankianifar

@ehsankianifar great work narrowing down the issue!

This issue was originally reported by ...

Appreciated, but no need to carry around attributions 🙏

OpenJDK project

I never reported an issue to OpenJDK, if someone want to step up and forward the problem I'd love to have a link to the open issue for tracking purposes.

andreaTP avatar Oct 30 '25 17:10 andreaTP

I would appreciate if @offamitkumar can help us here with OpenJDK side conversation.

r30shah avatar Oct 30 '25 17:10 r30shah

I am convinced that this needs to be fixed as I was little confused if this is expected behaviour. I have opened an issue on JBS (to be fixed in openjdk) : https://bugs.openjdk.org/browse/JDK-8371187

I guess @varada1110 is working on it actively. We will raise the fix ASAP.

offamitkumar avatar Nov 04 '25 04:11 offamitkumar

@offamitkumar, @varada1110, @r30shah do you think this issue will be resolved in time for the OpenJ9 0.57 release? If not, perhaps this should be moved to the 0.59 release.

hzongaro avatar Dec 01 '25 13:12 hzongaro

I see that the PR1 is in draft state, and we are totally dependent on community, to approve it. Once Varada comes back from vacation she will mark PR ready for review and we can try to get it merged quickly.

offamitkumar avatar Dec 02 '25 09:12 offamitkumar