zarr.js
zarr.js copied to clipboard
Support for float16
Background
Without support for the f2
dtype in Zarr.js, I need check all of my arrays on the python side to ensure that they will work with zarr.js downstream:
if arr.dtype.kind == 'f' and arr.dtype.itemsize == 2:
arr = arr.astype('<f4')
Feature request
While JS does not have a Float16Array
class, could Zarr.js load remote <f2
and >f2
arrays into Float32Array
s? In other words, is there something preventing adding the following lines to https://github.com/gzuidhof/zarr.js/blob/master/src/nestedArray/types.ts#L32 Perhaps only supported when mode: "r"
for readOnly mode?
const DTYPE_TYPEDARRAY_MAPPING = {
// ...
+ '<f2': Float32Array,
'<f4': Float32Array,
'<f8': Float64Array,
'>b': Int8Array,
'>B': Uint8Array,
'>u1': Uint8Array,
'>i1': Int8Array,
'>u2': Uint16Array,
'>i2': Int16Array,
'>u4': Uint32Array,
'>i4': Int32Array,
+ '>f2': Float32Array,
'>f4': Float32Array,
'>f8': Float64Array
};
Thanks for your patience in my response. TL;DR - It isn't possible to view a contiguous piece of memory that is f2
as f4
and have anything usable.
Unfortunately, I don't think it is as simple as adding more mappings to DTYPE_TYPEDARRAY_MAPPING
. A TypedArray
works by providing a an array-like view to an underlying ArrayBuffer
. In zarr.js, each array "chunk" is decompressed into a raw ArrayBuffer
and then the corresponding TypedArray
is used to provide a view of that underlying binary data.
Each element in a Float32Array
is 4-bytes of the underlying ArrayBuffer
viewed as a 32-bit IEEE floating point number. It isn't until you try to access the data from this view (e.,g., arr[0]
or Array.from(arr)
) that the value(s) is/are coerced into a JS Number
(s). TypedArray
s do a lot of the hard work for us because they provide this no-copy abstraction over the underlying binary data. Otherwise we'd need to manually parse the binary buffers ourselves into JS Array<number>
.
The reason you cannot simply view f2
as f4
is because each element requires different numbers of bytes with different bit layouts (as defined by IEEE),
float 16
float 32
so viewing a f2
buffer as a Float32Array
either won't work (buffer length must be a multiple of 4 for float32) or give you a Float32Array
that is half the length and values which aren't useful. This can be illustrated with taking a Float64Array
view of the underlying buffer of a Float32Array
:
let f32 = Float32Array([0, 1, 2, 3]);
let f64 = Float64Array(f32.buffer);
console.log(f64) // Float64Array [ 0.0078125, 32.00000762939453 ]
Thank you for the explanation! I did not realize that the zarr.js chunks were being passed directly to the TypedArray via ArrayBuffer. I wonder if https://github.com/petamoriken/float16 could be used in this case (tho I have never tried it, just came across the repo), but of course would add a dependency
Published in v0.6.0!
https://guido.io/zarr.js/#/advanced/float16
Published in v0.6.0!
https://guido.io/zarr.js/#/advanced/float16
Is there any reason why this method shouldn't work? Because I am trying it and still getting the unsupported error. My imports look like this
import { Float16Array } from "@petamoriken/float16";
// !Important! Make sure this global is set _before_ importing Zarr.js
globalThis.Float16Array = Float16Array;
import type { Float16ArrayConstructor } from "@petamoriken/float16";
declare global {
var Float16Array: Float16ArrayConstructor;
}
import { HTTPStore, openArray } from "zarr";
import {slice as zarrSlice} from "zarr";