pp-sketchlib icon indicating copy to clipboard operation
pp-sketchlib copied to clipboard

HDF5 file needs to be split into multiple groups

Open johnlees opened this issue 2 years ago • 1 comments

When there gets to be >500k or so sketches in the sketch group performance gets very slow, looks like it's because the metadata cache size isn't large enough: https://forum.hdfgroup.org/t/limit-on-the-number-of-datasets-in-one-group/5892

I think the solution will be to make subgroups 'sketch1', 'sketch2' etc with some block size of sketches in each, say 30k. Just need a bit of care to make sure it's all backwards compatible.

johnlees avatar Apr 21 '22 08:04 johnlees

I'm wondering if switching to apache arrow at some point might solve this and #37

johnlees avatar May 03 '22 11:05 johnlees