streaming icon indicating copy to clipboard operation
streaming copied to clipboard

Sparse Numpy Arrays

Open Matagi1996 opened this issue 5 months ago • 0 comments

🚀 Feature Request

Add sparse Numpy Arrays as supported field

Motivation

I was trying to serialize ~150 of overlapping binary masks (think SAM autogenerated masks) / Image to Streaming format. With Overlapping masks I cant safe as Img (one pixel, several values), or n-layer Tiff (too big), so I opted for RLE format, (usually Json) As saving 200+ Json dicts/Image also seems inefficient, I thought about saving Each RLE as a 1D Vector [Size_X,Size_Y,RLE_Int64] and save them as Numpy Array. This array needs to be 0 padded at the Moment because RLE encoding has different length depending on Mask Size/location.

The abouve encoding seems to work fine and seems fast. Problem is: RLE Indexes get big, so INT64 is nessesary, making the padding to longest RLE quite wastefull, if I could use sparse Numpy arrays I would not need to pad the array to longest sequence.

[Optional] Implementation

Additional context

With streaming trying to put data belonging to each other as close as possible, I dont even know if sparse arrays is achievable. Maybe there is a workaround with custom datastructures that already exists but i dont think that would be optimal. Inbuild Compression might also already be good enough.

Matagi1996 avatar Sep 10 '24 05:09 Matagi1996