Dan King
Dan King
Were you using the old copier or the new (not yet merged) `hailctl fs sync`? I had hoped the latter was finally robust enough for real use. `hailtop.aiotools.copy` is indeed...
I'm not certain this is any faster, so let's look at the benchmarks first.
Clearly some benefit to compression time but decompress is impaired in a few cases. We don’t have enough samples yet to know how much variance there is.
This is not clearly better than `develop`.
I believe we only greedily combine chunks, we don't split large ones. The bad case I had was a ~8M row array being sliced into ~100 64Ki arrays, each of...
The reader no longer supports an externally requested chunk size, so this particular speed bump is gone.
Whipped up a hack today and it seems promising. CompressedBoolArray holds a u8 array in any encoding and does the bit indexing to get the values of interest. I looked...
See here: https://github.com/vortex-data/vortex/pull/5230
I also tried a compressor which converts the validity into its indices array and compresses that. This seems to only be good for things that are extremely sparsely null. Intuitively,...
Heh. Okay, it looks like we inherit from pyarrow.dataset.Scanner but never define a projected_schema. I’m not sure exactly how that turns into a segfault but the fix is clear: define...