cudf
cudf copied to clipboard
[BUG]: Incorrect missing values in pylibcudf.Table.from_arrow on a sliced, string_view column
Describe the bug
When creating a plc.Table.from_arrow on a pyarrow Table with a sliced string_view column, something seems to be off about the validity map:
Steps/Code to reproduce bug
import pyarrow as pa, pylibcudf as plc
table = pa.table({"a": pa.array(["a", None], pa.string_view())})
roundtrip = plc.interop.to_arrow(plc.interop.from_arrow(table.slice(1, 2)))
result = roundtrip.columns[0][0].as_py()
assert result is None, result
That fails
AssertionError:
(not the best message, but result is the empty string rather than None.
Expected behavior
result should be None.
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
- Method of cuDF install: [conda, Docker, or from source]
- If method of install is [Docker], provide
docker pull&docker runcommands used
- If method of install is [Docker], provide
Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details
Additional context
Maybe the root cause of https://github.com/rapidsai/cudf/issues/19148. Check back after fixing.