calcite icon indicating copy to clipboard operation
calcite copied to clipboard

[CALCITE-3513] Unify TableFunction implementor's NullPolicy and its behavior

Open DonnyZone opened this issue 5 years ago • 11 comments

Details are illustrated in CALCITE-3513, Table function's implementation should be NullPolicy.NONE.

DonnyZone avatar Nov 18 '19 12:11 DonnyZone

hi @DonnyZone This modification is more rigorous than before, and the scope is even narrower. Is there any reason for this change?

XuQianJin-Stars avatar Nov 25 '19 12:11 XuQianJin-Stars

@XuQianJin-Stars I'm refactoring the codegen component now and try to unify the inconsistencies buried under current codegen framework. This PR need to do more work. I will update it these couple of days.

DonnyZone avatar Nov 25 '19 12:11 DonnyZone

Spelling: behavior not beheavior

julianhyde avatar Nov 26 '19 02:11 julianhyde

ScalarFunctionImpl determines NullPolicy based on annotations, while TableFunctionImpl sets NullPolicy=ANY by default for all cases. However, according to the javadoc in TableFunction, null arguments are allowed.

  /**
   * Returns the row type of the table yielded by this function when
   * applied to given arguments. Only literal arguments are passed,
   * non-literal are replaced with default values (null, 0, false, etc).
   *
   * @param arguments arguments of a function call (only literal arguments
   *                  are passed, nulls for non-literal ones)
   * @return element type of the table (e.g. {@code Object[].class})
   */
  Type getElementType(List<Object> arguments);

DonnyZone avatar Nov 26 '19 05:11 DonnyZone

Hi, @julianhyde, Current now, I adopt the approach as that in ScalarFunctionImpl to dynamically determines the NullPolicy based on annotation. But as shown in TableFunctionTests, it seems not to be reasonable when a table function returns null. The code generated (org.apache.calcite.runtime.Enumerables.slice0(null)) will produce NPE.

DonnyZone avatar Nov 26 '19 05:11 DonnyZone

@vlsi Thanks for review. Actually, I do not have a clear direction for this work now. Let me clarify it and look forward to hearing your advice. At the beginning, I found that NullPolicy.ANY is not appropriate for all TableFunctions. For example, the query in TableFunctionTest#testMultipleScannableTableFunctionWithNamedParameters`.

select * from table("s"."Maze"(HEIGHT => 3, WIDTH => 5)`

It passes null argument (i.e., org.apache.calcite.util.Smalls.MazeTable.generate2(5, 3, null)), but the result is not null.

Then, I think NullPolicy.NONE may be more suitable. But it prevents users to define their own NullPolicy in their implementations.

In the end, I inclined to adopt the approach (getNullPolicy(Method m)) as that in ScalarFunctionImp. But I found that we do not have any standards to handle the case when a table function returns null, as the NPE in tests. Therefore, NullPolicy.ANY/STRICT/ARG0 seems to be meaningless.

DonnyZone avatar Jan 20 '20 10:01 DonnyZone

Ok, so your main question is "what should Calcite do when a table function returns NPE"?

Oracle makes it indistinguishable from an empty collection.

For instance:

select * from dual, table(cast(null as sys.odcinumberlist)) t;

DUMMY | COLUMN_VALUE
0 rows

select * from dual, table(sys.odcinumberlist()) t;

DUMMY | COLUMN_VALUE
0 rows

select * from dual, table(cast(null as sys.odcinumberlist))(+) t;
DUMMY | COLUMN_VALUE
    Y | [NULL]
1 row

select * from dual, table(sys.odcinumberlist())(+) t;
DUMMY | COLUMN_VALUE
    Y | [NULL]
1 row

Therefore, NullPolicy.ANY/STRICT/ARG0 seems to be meaningless

It is probably more to optimization and/or prevent invocation of the function on invalid inputs. For instance, if NullPolicy is set to ANY, then enumerable executor won't invoke the function in case there are null arguments.

On the other hand, treating null table function result as an empty table is a different issue.

vlsi avatar Jan 20 '20 10:01 vlsi

Is there any calcite connection to run the table function Because we are running this function in windows 7 operating system through sqlline. As sqlline.bat sqlline>!connect jdbc:calcite: admin admin

34venu avatar Jan 20 '20 11:01 34venu

This is good discussion, but can if you are changing the specification, can you move any conclusions over to the JIRA case?

I think that table functions should behave the same as regular functions with regard to null arguments.

IIRC, there are annotations that you can put against a UDF to control whether a null argument means a null result. If so, those annotations should also apply to table functions.

I agree with @vlsi that if a table function returns null it should be interpreted in SQL as an empty relation. (Vladimir, did you check what happens if a table function returns null and it is an argument to a consuming table function? I would guess that the consuming table function sees an empty, but not null, argument.)

julianhyde avatar Jan 20 '20 23:01 julianhyde

@34venu I'm not familiar with sqlline. The best option is to send email to the developers list (https://calcite.apache.org/community/#mailing-lists).

DonnyZone avatar Jan 21 '20 03:01 DonnyZone

Vladimir, did you check what happens if a table function returns null and it is an argument to a consuming table function?

null argument is just a null argument. Arguments are never converted to empty relations.

Note: Oracle schema does not make the distinction between regular functions and table functions. A function can be used as a table function if it returns a collection. However, the very same function can be used in SELECT or in PL/SQL, and null values are allowed for collections.

So I think it should be fine to return null from Calcite's table function, however, the executor should treat table(null) as an empty relation of the appropriate type.

vlsi avatar Jan 21 '20 06:01 vlsi