datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Failed to cast `[]` to `FixedSizeList(1, Null)`

Open Weijun-H opened this issue 1 year ago • 10 comments

Describe the bug

DataFusion CLI v35.0.0
❯ select arrow_cast(make_array(), 'FixedSizeList(1, Null)');
Arrow error: Cast error: Cannot cast to FixedSizeList(1): value at index 0 has length 0

❯ select arrow_cast([], 'FixedSizeList(1, Null)');
Arrow error: Cast error: Cannot cast to FixedSizeList(1): value at index 0 has length 0

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Weijun-H avatar Feb 08 '24 02:02 Weijun-H

I wonder if that's meant to work. The following works:

select arrow_cast([null], 'FixedSizeList(1, Null)');

However, if you wanted a zero-sized list then should it be be

select arrow_cast([], 'FixedSizeList(0, Null)');

However that throws the following error

thread 'main' panicked at arrow-datafusion/datafusion/common/src/scalar.rs:3184:5:
assertion `left == right` failed
  left: 0
 right: 1

r3stl355 avatar Feb 08 '24 20:02 r3stl355

Panic'ing is definitely not good

alamb avatar Feb 09 '24 20:02 alamb

I'll see what I can do

r3stl355 avatar Feb 12 '24 20:02 r3stl355

take

r3stl355 avatar Feb 12 '24 20:02 r3stl355

I've done some digging but did not find an easy fix, only few options listed below. Happy to follow up but need a decision on which fix to attempt.

The following works select arrow_cast([null], 'FixedSizeList(1, Null)'); so it's logical to use FixedSizeList(0, Null) when casting an empty array (select arrow_cast([], 'FixedSizeList(0, Null)');). However, that doesn't work because of the following:

  • Docstring for datafusion_common::ScalarValue::FixedSizeList says "The array must be a FixedSizeListArray with length 1." (the same applies to other Scalar::List* types) so any length other than 1 would be invalid.

https://github.com/r3stl355/arrow-datafusion/blob/3b355c798a3258f118016b33f26c5a55fed36220/datafusion/common/src/scalar/mod.rs#L231

  • During the cast, datafusion_common::ScalarValue::FixedSizeList(0, Null) is converted to arrow_schema::datatype::DataType::FixedSizeList(FieldRef, 0) before being passed to arrow::compute::kernels::cast::cast_with_options for evaluation of the arrow_cast

  • arrow::compute::kernels::cast::cast_with_options returns a FixedSizeListArray<0> of length 0 when called with arrow_schema::datatype::DataType::FixedSizeList(FieldRef, 0). Note that this is different for any length greater than 0 used in FixedSizeList (i.e. the return value will always be of length 1), e.g. called with FixedSizeList(FieldRef, 2) as cast type, arrow::compute::kernels::cast::cast_with_options which returns a FixedSizeListArray<2> with a length 1.

The possible fix options are:

  1. Raise an exception if 0 is used as a cast target type (i.e. FixedSizeList(0, Null)'))
  2. Try to convert FixedSizeList(FieldRef, 0) to FixedSizeList(FieldRef, 1) before calling cast_with_options but A. this feels really wrong and B. may still not work
  3. Raise an issue in Arrow asking to return a non-empty array when cast_with_options is called with FixedSizeList(FieldRef, 0). I'll do some digging there to see if it's possible, e.g if FixedSizeListArray<0>[NullArray(0),] would be a valid type

Lastly, this error happens when displaying the result but not when applying some other functions to it, e.g. this following works but its the only function I tested it with:

select arrow_typeof(arrow_cast([], 'FixedSizeList(0, Null)'));

r3stl355 avatar Feb 24 '24 10:02 r3stl355

I prefer 1. I think Fixedsizelist with len 0 is the same as an empty list. I don't think there is any useful case that we need to cast an empty list to Fixedsizelist(0, type). Return exec_error if casting to Fixedsizelist(0, any type). We just need to avoid panic for this casting.

jayzhan211 avatar Feb 24 '24 12:02 jayzhan211

@Weijun-H was there any specific reason you were trying to achieve this (i.e. select arrow_cast(make_array(), 'FixedSizeList(1, Null)');)?

r3stl355 avatar Feb 24 '24 13:02 r3stl355

@Weijun-H was there any specific reason you were trying to achieve this (i.e. select arrow_cast(make_array(), 'FixedSizeList(1, Null)');)?

There are no particular use cases now, I am working on #9108, which reminded me of this case. And also I vote for the first solution, which is more reasonable.

Weijun-H avatar Feb 25 '24 02:02 Weijun-H

I unassigned myself from this issue as I don't have much bandwidth at the moment so maybe someone else is willing to implement the changes. If nobody does then I'll come back to this in 2-3 weeks.

r3stl355 avatar Mar 22 '24 12:03 r3stl355

Looks like this is still open, happy to resume if noone else is working on it

r3stl355 avatar Apr 30 '24 11:04 r3stl355