datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support nulls and empty for array functions

Open jayzhan211 opened this issue 2 years ago • 7 comments

Which issue does this PR close?

Closes #7142. Re-open 7142 if others than append/prepend need to support nulls and empty.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jayzhan211 avatar Aug 19 '23 01:08 jayzhan211

@izveigor @alamb Ready for review, thanks!

jayzhan211 avatar Aug 19 '23 08:08 jayzhan211

I'll put this on my list -- thanks @jayzhan211

alamb avatar Aug 21 '23 18:08 alamb

I have extensively restructured the code in order to enhance its readability and extensibility. Also, array_concat is considered. I'm sure this PR can be further improved but it is time to seek feedback to prevent me from working in the wrong direction.

cc @alamb @izveigor

jayzhan211 avatar Aug 27 '23 07:08 jayzhan211

Hello, @jayzhan211! Sorry for the long delay. I will carefully review these changes tomorrow.

However, I have some questions about the nature of functions in Arrow Datafusion. I recently thought about solving the issue: https://github.com/apache/arrow-datafusion/issues/6559 and I noticed a fact that User Defined Functions in DataFusion does not parameterize the null handling behaviour (Unlike DuckDB with special link) So If we want to implement new array or other nested data types functions (like struct_insert) maybe it is better to remake signature's structure (I mean adding new boolean argument special to Signature).

I want to hear @alamb's opinion about this opportunity.

izveigor avatar Aug 28 '23 18:08 izveigor

Hello, @jayzhan211! Sorry for the long delay. I will carefully review these changes tomorrow.

However, I have some questions about the nature of functions in Arrow Datafusion. I recently thought about solving the issue: https://github.com/apache/arrow-datafusion/issues/6559 and I noticed a fact that User Defined Functions in DataFusion does not parameterize the null handling behaviour (Unlike DuckDB with special link) So If we want to implement new array or other nested data types functions (like struct_insert) maybe it is better to remake signature's structure (I mean adding new boolean argument special to Signature).

I want to hear @alamb's opinion about this opportunity.

I'm not familiar with udf. Is it possible to customize the behavior of null in udf? Why do we need additional parameters to deal with null?

In this example, why not pass Option<Fn>: dont_intercept_null to indicate whether customized null handling? None for default, Some for UDF.

duckdb.create_function('dont_intercept', dont_intercept_null, [BIGINT], BIGINT)
res = duckdb.sql("""
	select dont_intercept(NULL)
""").fetchall()
print(res)
# [(None,)]

duckdb.remove_function('dont_intercept')
duckdb.create_function('dont_intercept', dont_intercept_null, [BIGINT], BIGINT, null_handling='special')
res = duckdb.sql("""
	select dont_intercept(NULL)
""").fetchall()
print(res)

jayzhan211 avatar Aug 29 '23 11:08 jayzhan211

I plan to review this PR later -- I am on vacation this week so my response will likely be delayed

alamb avatar Aug 29 '23 12:08 alamb

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 03 '24 01:05 github-actions[bot]