coral
coral copied to clipboard
[Calcite] Showcase bug related to JOIN expression
Consider the following scenario to translate via Coral:
CREATE TABLE test.table_join_view_scenario_simple_table (key string);
CREATE TABLE test.table_join_view_scenario_table_nested_columns (
studentid struct<idtype:string,id:string>,
details struct<lastupdated:timestamp,name:string, misc:struct<enrollmentdate:date,major:string>>);
CREATE VIEW test.view_table_join_view_scenario_table_nested_columns AS
SELECT studentid.id student_id, details.name as student_name, details.misc.major major, details.lastUpdated details_last_updated
FROM test.table_join_view_scenario_table_nested_columns
CREATE VIEW test.view_table_join_with_view_table_with_nested_columns AS
SELECT v.student_id, v.student_name, v.details_last_updated
FROM test.view_table_join_view_scenario_table_nested_columns v
INNER JOIN test.table_join_view_scenario_simple_table t
ON concat('U', v.student_id) = t.key;
When we have an function call in the JOIN
expression like in the example provided:
concat('U', v.student_id) = t.key
we are then having a new LogicalProject
(different from the leaf LogicalProject
corresponding to the view view_table_join_view_scenario_table_nested_columns
see org.apache.calcite.sql2rel.SqlToRelConverter#leaves
) which includes the expression CAST(concat('U', $0.id)):VARCHAR(2147483647)
RexCall
.
This project is not considered a leaf.
In the flattening algorithm from org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard#flatten
therefore the LogicalProject
with the expressions
0 = {RexFieldAccess@8734} "$0.id"
1 = {RexFieldAccess@8735} "$1.name"
2 = {RexFieldAccess@8736} "$1.misc.major"
3 = {RexFieldAccess@8737} "$1.lastupdated"
4 = {RexCall@8738} "CAST(concat('U', $0.id)):VARCHAR(2147483647)"
will be flattened therefore again (because it is not a leaf) and we obtain therefore the LogicalTableScan
corresponding to the table table_join_view_scenario_table_nested_columns
which contains only two fields:
"studentid" -> {RelRecordType@8755} "RecordType:peek_no_expand(VARCHAR(2147483647) idtype, VARCHAR(2147483647) id)"
"details" -> {RelRecordType@8757} "RecordType:peek_no_expand(TIMESTAMP lastupdated, VARCHAR(2147483647) name, RecordType:peek_no_expand(DATE enrollmentdate, VARCHAR(2147483647) major) misc)"
Therefore the type corresponding to the LogicalTableScan
having only two fields is being returned for usage in SqlToRelConverter.convertIdentifier
.
However, details_last_updated
has the index 3
in the LogicalProject
0 = {RexFieldAccess@8734} "$0.id"
1 = {RexFieldAccess@8735} "$1.name"
2 = {RexFieldAccess@8736} "$1.misc.major"
3 = {RexFieldAccess@8737} "$1.lastupdated"
This is why we stumble in an AssertionError
in RexBuilder.makeFieldAccess
:
java.lang.AssertionError: Field ordinal 3 is invalid for type
org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard#flatten
source code:
public void flatten(
List<RelNode> rels,
int systemFieldCount,
int[] start,
List<Pair<RelNode, Integer>> relOffsetList) {
for (RelNode rel : rels) {
if (leaves.contains(rel) || rel instanceof LogicalMatch) {
relOffsetList.add(
Pair.of(rel, start[0]));
start[0] += rel.getRowType().getFieldCount();
} else {
if (rel instanceof LogicalJoin
|| rel instanceof LogicalAggregate) {
start[0] += systemFieldCount;
}
flatten(
rel.getInputs(),
systemFieldCount,
start,
relOffsetList);
}
}
}
This PR serves only for showcasing a potential unidentified bug in calcite-core
Stacktrace:
java.lang.AssertionError: Field ordinal 3 is invalid for type 'RecordType:peek_no_expand(RecordType:peek_no_expand(VARCHAR(2147483647) idtype, VARCHAR(2147483647) id) studentid, RecordType:peek_no_expand(TIMESTAMP lastupdated, VARCHAR(2147483647) name, RecordType:peek_no_expand(DATE enrollmentdate, VARCHAR(2147483647) major) misc) details)'
at org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3725)
at org.apache.calcite.sql2rel.SqlToRelConverter.access$2200(SqlToRelConverter.java:217)
at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4796)
at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4092)
at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4656)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3939)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:670)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181)
at com.linkedin.coral.hive.hive2rel.HiveSqlToRelConverter.convertQuery(HiveSqlToRelConverter.java:63)
at com.linkedin.coral.common.ToRelConverter.toRel(ToRelConverter.java:157)
at com.linkedin.coral.common.ToRelConverter.convertView(ToRelConverter.java:124)
at com.linkedin.coral.trino.rel2trino.HiveToTrinoConverterTest.testViews(HiveToTrinoConverterTest.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:701)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:893)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1218)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
at org.testng.TestRunner.privateRun(TestRunner.java:758)
at org.testng.TestRunner.run(TestRunner.java:613)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
at org.testng.SuiteRunner.run(SuiteRunner.java:240)
What changes are proposed in this pull request, and why are they necessary?
How was this patch tested?
Thanks for the PR. Could you simplify the table/view names in the test case? Also update the description to highlight the key reason why the test case fails (you could use the simplified names from the test). Right now, it is a bit hard to reverse engineer the problem from the test code alone.
@wmoustafa AC