fury icon indicating copy to clipboard operation
fury copied to clipboard

[Java] Platform-Dependent Serialization of Primitive Arrays Causes Data Corruption

Open LouisLou2 opened this issue 4 months ago • 0 comments

Search before asking

  • [x] I had searched in the issues and found no similar issues.

Version

latest commit

Component(s)

Java

Minimal reproduce step

Serializers for primitive arrays (e.g., IntArraySerializer) use methods like MemoryBuffer.writePrimitiveArray, which internally rely on Unsafe.copyMemory. This operation performs a raw memory copy that preserves the native byte order (endianness) of the host machine, making the serialization format platform-dependent.

This leads to data corruption when data is exchanged between systems with different endianness.

The following test demonstrates the issue by writing an int array on a little-endian machine and simulating how a big-endian machine would read it.

import org.apache.fury.Fory;
import org.apache.fury.memory.MemoryBuffer;
import org.apache.fury.platform.Platform;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;

public class EndiannessTest {
    public static void main(String[] args) {
        // Assume this code runs on a common Little-Endian machine (e.g., x86)
        System.out.println("Native Byte Order: " + ByteOrder.nativeOrder());
        MemoryBuffer buffer = MemoryBuffer.newHeapBuffer(32);

        // 1. Define an integer with a non-symmetrical byte pattern.
        int[] originalArray = new int[]{0x12345678};

        // 2. Write its raw bytes to the buffer, simulating the serializer's behavior.
        // This preserves the native (little-endian) byte order.
        buffer.writePrimitiveArray(originalArray, Platform.INT_ARRAY_OFFSET, 4);
        byte[] serializedBytes = buffer.getBytes(0, 4);
        System.out.println("Serialized bytes (hex): " + bytesToHex(serializedBytes));

        // 3. Simulate a Big-Endian machine reading these bytes.
        ByteBuffer bigEndianReader = ByteBuffer.wrap(serializedBytes);
        bigEndianReader.order(ByteOrder.BIG_ENDIAN);
        int deserializedValue = bigEndianReader.getInt();

        // 4. Compare the results.
        System.out.printf("Original value: 0x%08X\n", originalArray[0]);
        System.out.printf("Value interpreted on a Big-Endian machine: 0x%08X\n", deserializedValue);

        if (originalArray[0] != deserializedValue) {
            System.err.println("\nData Corruption Detected!");
        }
    }

    private static String bytesToHex(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        for (byte b : bytes) {
            sb.append(String.format("%02X ", b));
        }
        return sb.toString().trim();
    }
}

What did you expect to see?

The deserialized value should be identical to the original value, ensuring data portability across all platforms.

Original value: 0x12345678
Value interpreted on a Big-Endian machine: 0x12345678

What did you see instead?

The bytes written in little-endian order were misinterpreted by the big-endian reader, resulting in a corrupted value.

Native Byte Order: LITTLE_ENDIAN
Serialized bytes (hex): 78 56 34 12
Original value: 0x12345678
Value interpreted on a Big-Endian machine: 0x78563412

Data Corruption Detected!

Anything Else?

No response

Are you willing to submit a PR?

  • [x] I'm willing to submit a PR!

LouisLou2 avatar Aug 02 '25 10:08 LouisLou2