scio
scio copied to clipboard
Allow largeHash* and sparkey methods to set a byte size target
Estimate the size of input collections and allow users to configure (rough) numBytes rather than numShards.
I propose dropping numShards
completely. Also propose dropping special handling of "unsharded" sparkey and updating sparkey reads to infer from filenames directly