cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Fix bugs in handling of delta encodings

Open etseidl opened this issue 1 year ago • 2 comments

Description

Part of #14938 was fixing two bugs discovered during testing. One is in the encoding of DELTA_BINARY_PACKED data where the first non-null value in a page to be encoded is not in the first batch of 129 values. The second is an error in decoding of DELTA_BYTE_ARRAY pages where, again, the first non-null value is not in the first block to be decoded.

This PR includes a test for the former, but the latter cannot be easily tested because the python API still lacks skip_rows, and we cannot generate DELTA_BYTE_ARRAY encoded data without the changes in #14938. A test for the latter will be added later, but the fix has been validated with data on hand locally.

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

etseidl avatar Feb 16 '24 18:02 etseidl

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Feb 16 '24 18:02 copy-pr-bot[bot]

/ok to test

vuule avatar Feb 16 '24 18:02 vuule

/ok to test

davidwendt avatar Feb 22 '24 00:02 davidwendt

/merge

vuule avatar Feb 22 '24 08:02 vuule