scio icon indicating copy to clipboard operation
scio copied to clipboard

Allow largeHash* and sparkey methods to set a byte size target

Open kellen opened this issue 2 months ago • 0 comments

Estimate the size of input collections and allow users to configure (rough) numBytes rather than numShards.

I propose dropping numShards completely. Also propose dropping special handling of "unsharded" sparkey and updating sparkey reads to infer from filenames directly

kellen avatar May 02 '24 16:05 kellen