cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Add test of interoperability of cuDF and arrow BYTE_STREAM_SPLIT encoders

Open etseidl opened this issue 1 year ago • 2 comments

Description

BYTE_STREAM_SPLIT encoding was recently added to cuDF (#15311). The Parquet specification was recently changed (https://github.com/apache/parquet-format/pull/229) to extend the datatypes that can be encoded as BYTE_STREAM_SPLIT, and this was only recently implemented in arrow (https://github.com/apache/arrow/pull/40094). This PR adds a check that cuDF and arrow can produce compatible files using BYTE_STREAM_SPLIT encoding.

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

etseidl avatar May 22 '24 23:05 etseidl

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar May 22 '24 23:05 copy-pr-bot[bot]

/ok to test

vyasr avatar May 23 '24 02:05 vyasr

/ok to test

wence- avatar Jun 13 '24 14:06 wence-

/ok to test

wence- avatar Jun 24 '24 13:06 wence-

/merge

wence- avatar Jun 24 '24 13:06 wence-

Thanks @etseidl

wence- avatar Jun 24 '24 13:06 wence-