coral icon indicating copy to clipboard operation
coral copied to clipboard

[Calcite] Showcase bug related to JOIN expression

Open findinpath opened this issue 1 year ago • 2 comments

Consider the following scenario to translate via Coral:

CREATE TABLE test.table_join_view_scenario_simple_table (key string);
CREATE TABLE test.table_join_view_scenario_table_nested_columns (
      studentid struct<idtype:string,id:string>, 
      details struct<lastupdated:timestamp,name:string, misc:struct<enrollmentdate:date,major:string>>);

CREATE VIEW test.view_table_join_view_scenario_table_nested_columns AS 
SELECT studentid.id student_id, details.name as student_name, details.misc.major major, details.lastUpdated details_last_updated 
FROM test.table_join_view_scenario_table_nested_columns

CREATE VIEW test.view_table_join_with_view_table_with_nested_columns AS 
SELECT v.student_id, v.student_name, v.details_last_updated
FROM test.view_table_join_view_scenario_table_nested_columns v 
INNER JOIN test.table_join_view_scenario_simple_table t 
     ON concat('U', v.student_id) = t.key;

When we have an function call in the JOIN expression like in the example provided:

concat('U', v.student_id) = t.key

we are then having a new LogicalProject (different from the leaf LogicalProject corresponding to the view view_table_join_view_scenario_table_nested_columns see org.apache.calcite.sql2rel.SqlToRelConverter#leaves) which includes the expression CAST(concat('U', $0.id)):VARCHAR(2147483647) RexCall . This project is not considered a leaf.

In the flattening algorithm from org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard#flatten therefore the LogicalProject with the expressions

0 = {RexFieldAccess@8734} "$0.id"
1 = {RexFieldAccess@8735} "$1.name"
2 = {RexFieldAccess@8736} "$1.misc.major"
3 = {RexFieldAccess@8737} "$1.lastupdated"
4 = {RexCall@8738} "CAST(concat('U', $0.id)):VARCHAR(2147483647)"

will be flattened therefore again (because it is not a leaf) and we obtain therefore the LogicalTableScan corresponding to the table table_join_view_scenario_table_nested_columns which contains only two fields:

"studentid" -> {RelRecordType@8755} "RecordType:peek_no_expand(VARCHAR(2147483647) idtype, VARCHAR(2147483647) id)"
"details" -> {RelRecordType@8757} "RecordType:peek_no_expand(TIMESTAMP lastupdated, VARCHAR(2147483647) name, RecordType:peek_no_expand(DATE enrollmentdate, VARCHAR(2147483647) major) misc)"

Therefore the type corresponding to the LogicalTableScan having only two fields is being returned for usage in SqlToRelConverter.convertIdentifier.

However, details_last_updated has the index 3 in the LogicalProject

0 = {RexFieldAccess@8734} "$0.id"
1 = {RexFieldAccess@8735} "$1.name"
2 = {RexFieldAccess@8736} "$1.misc.major"
3 = {RexFieldAccess@8737} "$1.lastupdated"

This is why we stumble in an AssertionError in RexBuilder.makeFieldAccess :

java.lang.AssertionError: Field ordinal 3 is invalid for  type 

org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard#flatten source code:

    public void flatten(
        List<RelNode> rels,
        int systemFieldCount,
        int[] start,
        List<Pair<RelNode, Integer>> relOffsetList) {
      for (RelNode rel : rels) {
        if (leaves.contains(rel) || rel instanceof LogicalMatch) {
          relOffsetList.add(
              Pair.of(rel, start[0]));
          start[0] += rel.getRowType().getFieldCount();
        } else {
          if (rel instanceof LogicalJoin
              || rel instanceof LogicalAggregate) {
            start[0] += systemFieldCount;
          }
          flatten(
              rel.getInputs(),
              systemFieldCount,
              start,
              relOffsetList);
        }
      }
    }

This PR serves only for showcasing a potential unidentified bug in calcite-core

Stacktrace:

java.lang.AssertionError: Field ordinal 3 is invalid for  type 'RecordType:peek_no_expand(RecordType:peek_no_expand(VARCHAR(2147483647) idtype, VARCHAR(2147483647) id) studentid, RecordType:peek_no_expand(TIMESTAMP lastupdated, VARCHAR(2147483647) name, RecordType:peek_no_expand(DATE enrollmentdate, VARCHAR(2147483647) major) misc) details)'
	at org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3725)
	at org.apache.calcite.sql2rel.SqlToRelConverter.access$2200(SqlToRelConverter.java:217)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4796)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4092)
	at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4656)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3939)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:670)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181)
	at com.linkedin.coral.hive.hive2rel.HiveSqlToRelConverter.convertQuery(HiveSqlToRelConverter.java:63)
	at com.linkedin.coral.common.ToRelConverter.toRel(ToRelConverter.java:157)
	at com.linkedin.coral.common.ToRelConverter.convertView(ToRelConverter.java:124)
	at com.linkedin.coral.trino.rel2trino.HiveToTrinoConverterTest.testViews(HiveToTrinoConverterTest.java:44)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:701)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:893)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1218)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
	at org.testng.TestRunner.privateRun(TestRunner.java:758)
	at org.testng.TestRunner.run(TestRunner.java:613)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
	at org.testng.SuiteRunner.run(SuiteRunner.java:240)

What changes are proposed in this pull request, and why are they necessary?

How was this patch tested?

findinpath avatar Nov 15 '23 20:11 findinpath

Thanks for the PR. Could you simplify the table/view names in the test case? Also update the description to highlight the key reason why the test case fails (you could use the simplified names from the test). Right now, it is a bit hard to reverse engineer the problem from the test code alone.

wmoustafa avatar Nov 15 '23 22:11 wmoustafa

@wmoustafa AC

findinpath avatar Nov 16 '23 04:11 findinpath