Check the udf output size which should be equal to the input size
Hmmm...or we should check the result size of udf? I'm not sure wether it's proper that the sizes of input and result could be different. cc @alamb @mingmwang @tustvold
Originally posted by @doki23 in https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1475092781
@alamb @doki23 it seems that this issue is fixed?
@alamb @doki23 it seems that this issue is fixed?
I'm not sure :(
I think the idea of this ticket was to put some basic checks / assert to ensure that the output of UDFs has the correct
As I understand it this would mean adding (or seeing if there was an assert) that the number of output rows from accumulators was correct
Maybe somewhere in
https://github.com/apache/arrow-datafusion/blob/main/datafusion/physical-plan/src/aggregates/row_hash.rs
Hi, i would like to take this issue
So the idea here is that we add a check after invoking a ScalarUDF that the number of rows that came out was the same as the number that went in. If this is not the case DataFusion should raise an internal error with a clear error message
This was implemented in this PR (and we fixed 2 existing UDF violating this constraint)