[Python] Cannot cast nested nullable field to not-nullable
Casting from nullable field to not-nullable works provided all values are present. So for example this is a valid cast:
table = pa.table({'column_1': pa.array([1, 2 ,3])})table.cast(
pa.schema([
f.with_nullable(False) for f in table.schema
])
)
But it doesn't work for nested field. Here's an example:
import pyarrow as pa
record = {"nested_int": 1}
data_type = pa.struct(
[
pa.field("nested_int", pa.int32(), nullable=True),
]
)
data_type_after = pa.struct(
[
pa.field("nested_int", pa.int32(), nullable=False),
]
)
table = pa.table({"column_1": pa.array([record], data_type)})
table.cast(pa.schema([pa.field("column_1", data_type_after)]))
Throws:
pyarrow.lib.ArrowTypeError: cannot cast nullable field to non-nullable field: struct<nested_int: int32> struct<nested_int: int32 not null>
This is somewhat related to https://github.com/apache/arrow/issues/13177 and https://issues.apache.org/jira/browse/ARROW-16603
Reporter: &res / @0x26res
Note: This issue was originally created as ARROW-18430. Please see the migration documentation for further details.
Just ran into this as well:
>>> arr = pa.array([{'x': 1.0, 'y': 2.0}, {'x': 2.0, 'y': 3.0}])
>>> arr.type
StructType(struct<x: double, y: double>)
>>> arr.cast(pa.struct([pa.field("x", pa.float64(), nullable=False), pa.field("y", pa.float64(), nullable=False)]))
...
ArrowTypeError: cannot cast nullable field to non-nullable field: struct<x: double, y: double> struct<x: double not null, y: double not null>
/home/joris/scipy/repos/arrow/cpp/src/arrow/compute/exec.cc:920 kernel_->exec(kernel_ctx_, input, &output)
/home/joris/scipy/repos/arrow/cpp/src/arrow/compute/function.cc:277 executor->Execute(input, &listener)
I am moving this to 17.0.0. Let me know if it should be part of 16.0.0
Also ran into this, would be great to address this soonish.
Also just ran into this. Not familiar at all with arrows codebase, but if this isn't that hard and someone can give me pointers, I can take a stab at this. Opened #43782 as a first attempt, I would love some help with it if you can.
Issue resolved by pull request 43782 https://github.com/apache/arrow/pull/43782