cudf
cudf copied to clipboard
Truncate parquet column indexes
Description
Adds truncation of the min and max values of the Parquet column indexes, as recommended by the Parquet format. Adds a parameter column_index_truncate_length to the writer options/builder. It currently defaults to 64, which is the default used by parquet-mr.
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
Can one of the admins verify this patch?
Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.
add to allowlist
Codecov Report
:exclamation: No coverage uploaded for pull request base (
branch-22.10@d86bb39). Click here to learn what that means. The diff coverage isn/a.
@@ Coverage Diff @@
## branch-22.10 #11403 +/- ##
===============================================
Coverage ? 86.36%
===============================================
Files ? 145
Lines ? 22950
Branches ? 0
===============================================
Hits ? 19821
Misses ? 3129
Partials ? 0
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@hyperbolic2346 the merge race is on again! #11424 is going to break me here :) Since you're all up in the stats space, could you take a peek, esp at the TODO comment above page_enc.cu:truncate_string()?
rerun tests
I think we're getting there. I'll make one more pass over the whole thing tomorrow.
Thanks for all the cycles you're spending on this!
So it occurred to me that I can overcome my std::byte fear and loathing with a few well placed operator overloads.
constexpr std::byte operator| (std::byte lhs, uint8_t rhs) noexcept
{
return static_cast<std::byte>(static_cast<uint8_t>(lhs) | rhs);
}
constexpr std::byte operator& (std::byte lhs, uint8_t rhs) noexcept
{
return static_cast<std::byte>(static_cast<uint8_t>(lhs) & rhs);
}
constexpr bool operator==(std::byte lhs, uint8_t rhs) noexcept
{
return lhs == std::byte{rhs};
}
constexpr bool operator!=(std::byte lhs, uint8_t rhs) noexcept
{
return lhs != std::byte{rhs};
}
and suddenly it's not so bad. Beyond scope for now, but food for thought. Can't wait for operator<=>
and suddenly it's not so bad. Beyond scope for now, but food for thought.
That really isn't so bad, but at the end of the day you have to wonder what we gained outside of making someone happy that std::byte was being used.
@gpucibot merge