zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

test_format_compatibility fails on big-endian systems

Open QuLogic opened this issue 2 years ago • 4 comments

Problem description

When running on a big-endian system, test_format_compatibility fails due to an endian mismatch.

__________________________ test_format_compatibility ___________________________
    def test_format_compatibility():
    
        # This test is intended to catch any unintended changes that break the ability to
        # read data stored with a previous minor version (which should be format-compatible).
    
        # fixture data
        fixture = group(store=DirectoryStore('fixture'))
    
        # set seed to get consistent random data
        np.random.seed(42)
    
        arrays_chunks = [
            (np.arange(1111, dtype='<i1'), 100),
            (np.arange(1111, dtype='<i2'), 100),
            (np.arange(1111, dtype='<i4'), 100),
            (np.arange(1111, dtype='<i8'), 1000),
            (np.random.randint(0, 200, size=2222, dtype='u1').astype('<u1'), 100),
            (np.random.randint(0, 2000, size=2222, dtype='u2').astype('<u2'), 100),
            (np.random.randint(0, 2000, size=2222, dtype='u4').astype('<u4'), 100),
            (np.random.randint(0, 2000, size=2222, dtype='u8').astype('<u8'), 100),
            (np.linspace(0, 1, 3333, dtype='<f2'), 100),
            (np.linspace(0, 1, 3333, dtype='<f4'), 100),
            (np.linspace(0, 1, 3333, dtype='<f8'), 100),
            (np.random.normal(loc=0, scale=1, size=4444).astype('<f2'), 100),
            (np.random.normal(loc=0, scale=1, size=4444).astype('<f4'), 100),
            (np.random.normal(loc=0, scale=1, size=4444).astype('<f8'), 100),
            (np.random.choice([b'A', b'C', b'G', b'T'],
                              size=5555, replace=True).astype('S'), 100),
            (np.random.choice(['foo', 'bar', 'baz', 'quux'],
                              size=5555, replace=True).astype('<U'), 100),
            (np.random.choice([0, 1/3, 1/7, 1/9, np.nan],
                              size=5555, replace=True).astype('<f8'), 100),
            (np.random.randint(0, 2, size=5555, dtype=bool), 100),
            (np.arange(20000, dtype='<i4').reshape(2000, 10, order='C'), (100, 3)),
            (np.arange(20000, dtype='<i4').reshape(200, 100, order='F'), (100, 30)),
            (np.arange(20000, dtype='<i4').reshape(200, 10, 10, order='C'), (100, 3, 3)),
            (np.arange(20000, dtype='<i4').reshape(20, 100, 10, order='F'), (10, 30, 3)),
            (np.arange(20000, dtype='<i4').reshape(20, 10, 10, 10, order='C'), (10, 3, 3, 3)),
            (np.arange(20000, dtype='<i4').reshape(20, 10, 10, 10, order='F'), (10, 3, 3, 3)),
        ]
    
        compressors = [
            None,
            Zlib(level=1),
            BZ2(level=1),
            Blosc(cname='zstd', clevel=1, shuffle=0),
            Blosc(cname='zstd', clevel=1, shuffle=1),
            Blosc(cname='zstd', clevel=1, shuffle=2),
            Blosc(cname='lz4', clevel=1, shuffle=0),
        ]
    
        for i, (arr, chunks) in enumerate(arrays_chunks):
    
            if arr.flags.f_contiguous:
                order = 'F'
            else:
                order = 'C'
    
            for j, compressor in enumerate(compressors):
                path = '{}/{}'.format(i, j)
    
                if path not in fixture:  # pragma: no cover
                    # store the data - should be one-time operation
                    fixture.array(path, data=arr, chunks=chunks, order=order,
                                  compressor=compressor)
    
                # setup array
                z = fixture[path]
    
                # check contents
                if arr.dtype.kind == 'f':
                    assert_array_almost_equal(arr, z[:])
                else:
                    assert_array_equal(arr, z[:])
    
                # check dtype
>               assert arr.dtype == z.dtype
E               AssertionError: assert dtype('>U4') == dtype('<U4')
E                +  where dtype('>U4') = array(['foo', 'quux', 'quux', ..., 'bar', 'bar', 'quux'], dtype='>U4').dtype
E                +  and   dtype('<U4') = <zarr.core.Array '/15/0' (5555,) <U4>.dtype
zarr/tests/test_storage.py:2052: AssertionError

Version and installation information

Please provide the following:

  • Value of zarr.__version__: 2.10.1
  • Value of numcodecs.__version__: 0.9.1
  • Version of Python interpreter: 3.9.7
  • Operating system: Fedora 34
  • How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): from source

QuLogic avatar Oct 02 '21 05:10 QuLogic

Thanks, @QuLogic. I don't assume you've had any insights into what's going on?

joshmoore avatar Nov 08 '21 14:11 joshmoore

Based on https://github.com/actions/virtual-environments/issues/2187 I assume it will be at earliest next year for GHA support for a big-endian system. Shall we try to temporarily re-enable another platform like CircleCI?

joshmoore avatar Nov 15 '21 11:11 joshmoore

See #869 for an attempt to use a qemu docker image.

joshmoore avatar Nov 15 '21 14:11 joshmoore

Unfortunately these tests passed in https://github.com/zarr-developers/zarr-python/runs/4214212844?check_suite_focus=true :/

joshmoore avatar Nov 15 '21 16:11 joshmoore