datafusion
datafusion copied to clipboard
Failed to cast `[]` to `FixedSizeList(1, Null)`
Describe the bug
DataFusion CLI v35.0.0
❯ select arrow_cast(make_array(), 'FixedSizeList(1, Null)');
Arrow error: Cast error: Cannot cast to FixedSizeList(1): value at index 0 has length 0
❯ select arrow_cast([], 'FixedSizeList(1, Null)');
Arrow error: Cast error: Cannot cast to FixedSizeList(1): value at index 0 has length 0
To Reproduce
No response
Expected behavior
No response
Additional context
No response
I wonder if that's meant to work. The following works:
select arrow_cast([null], 'FixedSizeList(1, Null)');
However, if you wanted a zero-sized list then should it be be
select arrow_cast([], 'FixedSizeList(0, Null)');
However that throws the following error
thread 'main' panicked at arrow-datafusion/datafusion/common/src/scalar.rs:3184:5:
assertion `left == right` failed
left: 0
right: 1
Panic'ing is definitely not good
I'll see what I can do
take
I've done some digging but did not find an easy fix, only few options listed below. Happy to follow up but need a decision on which fix to attempt.
The following works select arrow_cast([null], 'FixedSizeList(1, Null)'); so it's logical to use FixedSizeList(0, Null) when casting an empty array (select arrow_cast([], 'FixedSizeList(0, Null)');). However, that doesn't work because of the following:
- Docstring for
datafusion_common::ScalarValue::FixedSizeListsays "The array must be a FixedSizeListArray with length 1." (the same applies to other Scalar::List* types) so any length other than 1 would be invalid.
https://github.com/r3stl355/arrow-datafusion/blob/3b355c798a3258f118016b33f26c5a55fed36220/datafusion/common/src/scalar/mod.rs#L231
-
During the cast,
datafusion_common::ScalarValue::FixedSizeList(0, Null)is converted toarrow_schema::datatype::DataType::FixedSizeList(FieldRef, 0)before being passed toarrow::compute::kernels::cast::cast_with_optionsfor evaluation of thearrow_cast -
arrow::compute::kernels::cast::cast_with_optionsreturns a FixedSizeListArray<0> of length 0 when called witharrow_schema::datatype::DataType::FixedSizeList(FieldRef, 0). Note that this is different for any length greater than 0 used inFixedSizeList(i.e. the return value will always be of length 1), e.g. called withFixedSizeList(FieldRef, 2)as cast type,arrow::compute::kernels::cast::cast_with_optionswhich returns aFixedSizeListArray<2>with a length 1.
The possible fix options are:
- Raise an exception if 0 is used as a cast target type (i.e.
FixedSizeList(0, Null)')) - Try to convert
FixedSizeList(FieldRef, 0)toFixedSizeList(FieldRef, 1)before callingcast_with_optionsbut A. this feels really wrong and B. may still not work - Raise an issue in Arrow asking to return a non-empty array when
cast_with_optionsis called withFixedSizeList(FieldRef, 0). I'll do some digging there to see if it's possible, e.g ifFixedSizeListArray<0>[NullArray(0),]would be a valid type
Lastly, this error happens when displaying the result but not when applying some other functions to it, e.g. this following works but its the only function I tested it with:
select arrow_typeof(arrow_cast([], 'FixedSizeList(0, Null)'));
I prefer 1. I think Fixedsizelist with len 0 is the same as an empty list. I don't think there is any useful case that we need to cast an empty list to Fixedsizelist(0, type). Return exec_error if casting to Fixedsizelist(0, any type). We just need to avoid panic for this casting.
@Weijun-H was there any specific reason you were trying to achieve this (i.e. select arrow_cast(make_array(), 'FixedSizeList(1, Null)');)?
@Weijun-H was there any specific reason you were trying to achieve this (i.e.
select arrow_cast(make_array(), 'FixedSizeList(1, Null)');)?
There are no particular use cases now, I am working on #9108, which reminded me of this case. And also I vote for the first solution, which is more reasonable.
I unassigned myself from this issue as I don't have much bandwidth at the moment so maybe someone else is willing to implement the changes. If nobody does then I'll come back to this in 2-3 weeks.
Looks like this is still open, happy to resume if noone else is working on it