calcite icon indicating copy to clipboard operation
calcite copied to clipboard

[CALCITE-4188] Support EnumerableBatchNestedLoopJoin inJdbcToEnumerab…

Open angelzouxin opened this issue 5 years ago • 7 comments

https://issues.apache.org/jira/browse/CALCITE-4188

angelzouxin avatar Aug 21 '20 08:08 angelzouxin

Dear @amaliujia, do you have any more comments for this PR?

angelzouxin avatar Sep 27 '20 11:09 angelzouxin

@angelzouxin will take a look soon. Let's see whether we can commit this PR to 1.26.0

amaliujia avatar Sep 27 '20 17:09 amaliujia

@amaliujia Do you think this should be in the 1.26.0 ? And do you have time to review it ?

danny0405 avatar Sep 28 '20 02:09 danny0405

@danny0405 my impression is 1.26.0 will be cut by end of this week. I will take a look at this PR soon but it seems that I need more context to understand about this change so I think it might not be feasible to merge this PR before 1.26.0.

I have removed the 1.26.0 from this JIRA (I added that tag before).

amaliujia avatar Sep 28 '20 03:09 amaliujia

I think the PR is in pretty good shape and we should try to get it in for the next release. Are you planning to push it forward @amaliujia or should I take over?

zabetak avatar Nov 30 '20 09:11 zabetak

@zabetak please feel free to review this PR and merge it when you think it is ready.

amaliujia avatar Nov 30 '20 20:11 amaliujia

This is a super valuable PR, just wanted to bring it up again and also give some feedback on it:

I rebased this locally off of main today and built it, it works except in certain odd circumstances. For instance, here's the following query being run:

String query = "SELECT dname, MULTISET(" +
               "  SELECT ename, MULTISET(" +
               "    SELECT 1 FROM (VALUES (10), (20), (30), (40)) AS T(d_no) WHERE t.d_no IN (10, 20)" +
               "  ) FROM emp AS e WHERE deptno = dept.deptno" +
               ") FROM dept";

image

It also worked with:

String query2 = "select\n" +
                "    \"first_name\",\n" +
                "    multiset(\n" +
                "        select \"store_name\", multiset(\n" +
                "            select \"fname\"\n" +
                "            from \"customer\"\n" +
                "            where \"customer\".\"customer_region_id\" <> 0 \n" +
                "        ) \n" +
                "        from \"store\"\n" +
                "        where \"store_id\" = \"employee\".\"store_id\"\n" +
                "    )\n" +
                "from \"employee\"\n" +
                "limit 2";

What I have noticed though is that if the multiset returns an empty relation in a correlated subquery, you get an error about $cor0 variable being null.

To trigger this, you can change the where customer.customer_region_id clause to be:

where "customer"."customer_region_id" = "store"."region_id"

Running that will give you:

Correlation variable $cor0 should be defined
java.lang.AssertionError: Correlation variable $cor0 should be defined
	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.getCorrelVariableGetter(EnumerableRelImplementor.java:472)
	at org.apache.calcite.adapter.jdbc.JdbcToEnumerableConverter.generateCorrelate(JdbcToEnumerableConverter.java:351)
	at org.apache.calcite.adapter.jdbc.JdbcToEnumerableConverter.implement(JdbcToEnumerableConverter.java:188)
	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:104)
	at org.apache.calcite.adapter.enumerable.EnumerableCorrelate.implement(EnumerableCorrelate.java:113)
	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:104)
	at org.apache.calcite.adapter.enumerable.EnumerableCalc.implement(EnumerableCalc.java:118)
	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:104)
	at org.apache.calcite.adapter.enumerable.EnumerableCollect.implement(EnumerableCollect.java:81)
	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:104)
	at org.apache.calcite.adapter.enumerable.EnumerableNestedLoopJoin.implement(EnumerableNestedLoopJoin.java:160)
	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:104)
	at org.apache.calcite.adapter.enumerable.EnumerableLimit.implement(EnumerableLimit.java:98)
	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:104)
	at org.apache.calcite.adapter.enumerable.EnumerableCalc.implement(EnumerableCalc.java:118)
	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:111)
	at org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:114)
	at org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1131)
	at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:324)

And there is one more error which is less understandable to me, you can produce this one by modifying the emps deps query to:

String query = "SELECT dname, MULTISET(" +
               "  SELECT ename, MULTISET(" +
               "    SELECT * FROM dept AS d WHERE d.deptno = e.deptno" +
               "  ) FROM emp AS e WHERE deptno = dept.deptno" +
               ") FROM dept";
java.sql.SQLException: Error while executing SQL: Index 7 out of bounds for length 3
...
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 7 out of bounds for length 3
	at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:77)
	at org.apache.calcite.rel.core.TableScan.project(TableScan.java:164)
	at org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(RelFieldTrimmer.java:1282)
	... 66 more

I would offer to fix these and push this PR through to finished, but I still don't really have a clue about how Calcite works under the hood and that fancy relational algebra stuff =/

Hope this info is useful at least!

GavinRay97 avatar Jul 29 '22 23:07 GavinRay97