streaming
streaming copied to clipboard
Sparse Numpy Arrays
🚀 Feature Request
Add sparse Numpy Arrays as supported field
Motivation
I was trying to serialize ~150 of overlapping binary masks (think SAM autogenerated masks) / Image to Streaming format. With Overlapping masks I cant safe as Img (one pixel, several values), or n-layer Tiff (too big), so I opted for RLE format, (usually Json) As saving 200+ Json dicts/Image also seems inefficient, I thought about saving Each RLE as a 1D Vector [Size_X,Size_Y,RLE_Int64] and save them as Numpy Array. This array needs to be 0 padded at the Moment because RLE encoding has different length depending on Mask Size/location.
The abouve encoding seems to work fine and seems fast. Problem is: RLE Indexes get big, so INT64 is nessesary, making the padding to longest RLE quite wastefull, if I could use sparse Numpy arrays I would not need to pad the array to longest sequence.
[Optional] Implementation
Additional context
With streaming trying to put data belonging to each other as close as possible, I dont even know if sparse arrays is achievable. Maybe there is a workaround with custom datastructures that already exists but i dont think that would be optimal. Inbuild Compression might also already be good enough.