drill icon indicating copy to clipboard operation
drill copied to clipboard

DRILL-5691: enhance scalar sub queries checking for the cartesian join

Open weijietong opened this issue 7 years ago • 21 comments

weijietong avatar Jul 27 '17 12:07 weijietong

The build process is good at my laptop. anyone please work this out.

weijietong avatar Jul 27 '17 13:07 weijietong

@weijietong thanks for the PR.

  1. Is it possible to add unit tests?
  2. Can you please add in method description that a table scan with only one row should also be considered as a scalar as well.
  3. Please fix indention in if statements and change currentrel to camel case [1].

[1] https://drill.apache.org/docs/apache-drill-contribution-guidelines/

arina-ielchiieva avatar Aug 02 '17 10:08 arina-ielchiieva

@arina-ielchiieva I have corrected the codes as you guide. But sorry for the unit tests, I have tried a long time to simulate a scan with 1 row ,but failed to do that. The row count of a scan is fetched from the AbstractGroupScan.getScanStats method. At my plugin ,I override this method to ensure it will return 1 .

weijietong avatar Aug 04 '17 10:08 weijietong

@weijietong regarding the unit test, I have tried to reproduce the problem and written the following unit test:

  @Test
  public void test() throws Exception {
    FileSystem fs = null;
    try {
      fs = FileSystem.get(new Configuration());

      // create table with partition pruning
      test("use dfs_test.tmp");
      String tableName = "table_with_pruning";
      Path dataFile = new Path(TestTools.getWorkingPath(),"src/test/resources/parquet/alltypes_required.parquet");
      test("create table %s partition by (col_int) as select * from dfs.`%s`", tableName, dataFile);

      // generate metadata
      test("refresh table metadata `%s`", tableName);
      
      // execute query
      String query = String.format("select count(distinct col_int), count(distinct col_chr) from `%s` where col_int = 45436", tableName);
      test(query);

    } finally {
      if (fs != null) {
        fs.close();
      }
    }
  }

AbstractGroupScan.getScanStats method returns one row but it does not fail. Can you please take a look?

arina-ielchiieva avatar Aug 08 '17 10:08 arina-ielchiieva

@arina-ielchiieva your test case can not reproduce the error . You can search the dev email to find the origin error description with the keyword "Drill query planning error". Your query already satisfy the NestedLoopJoinPrule. My case is that I add another rule to change the Aggregate-->Aggregate-->Scan to Scan as the transformed Scan relnode already holding the count(distinct ) value. When this transformation occurs, the NestedLoopJoinPrule's checkPreconditions method will invoke JoinUtils.hasScalarSubqueryInput. Then it will fail, as the transformed relnode has no aggregate node which does not satisfy the current scalar rule.

I think it's hard to reproduce this error without a specific rule like what I do. the precondition is:

  1. a nested loop join
  2. no (aggregate--> aggregate) count distinct relation nodes in the plan
  3. the row number of one child of the nested loop join is 1 .

I wonder if the enhanced code does not break the current unit test ,it will be ok.

weijietong avatar Aug 08 '17 15:08 weijietong

@weijietong thanks for code formatting.

I wonder if the enhanced code does not break the current unit test ,it will be ok.

Can you please check this? Did all unit tests pass after the change?

arina-ielchiieva avatar Aug 09 '17 14:08 arina-ielchiieva

@arina-ielchiieva have passed all the unit tests under java-exec bundle , got some errors which are sure not associated with this change, maybe my branch was old from the master.

TestConstantFolding.testConstantFolding_allTypes:163 » org.apache.drill.commo... TestCustomUserAuthenticator.positiveUserAuth » UserRemote SYSTEM ERROR: URISyn... TestCustomUserAuthenticator.positiveUserAuthAfterNegativeUserAuth » UserRemote TestInfoSchema.selectFromAllTables » UserRemote SYSTEM ERROR: URISyntaxExcepti... TestViewSupport.infoSchemaWithView:355->BaseTestQuery.testRunAndReturn:331 » Rpc TestInfoSchemaFilterPushDown.testFilterPushdown_NonEqual » UserRemote SYSTEM E... TestParquetScan.testSuccessFile:64->BaseTestQuery.testRunAndReturn:331 » Rpc o... TestTpchDistributedConcurrent.testConcurrentQueries:190 » test timed out afte...

I also found the error to do mvn test ,got the same error as JIRA DRILL-4104 . through mvn test -pl exec/java-exec this method,I got the unit test result. I wonder how the devs do the test result .

weijietong avatar Aug 10 '17 15:08 weijietong

On mine environment all tests pass. I used to have troubles but long time ago. I had to fix issues with timezones, memory usage MAVEN_OPTS="-Xmx2048m -XX:MaxPermSize=1024m". Also I am pretty sure that some test have failed because the others did, usually the ones with Rpc exception. Maybe UserRemote SYSTEM ERROR: URISyntaxException is connected with your env. You should check that. Also please rebase on the latest master (you can do push -f again), also if TestTpchDistributedConcurrent.testConcurrentQueries will be failing again, try to disable it but make sure it passes separately. I usually run all test mvn clean install not by modules.

arina-ielchiieva avatar Aug 10 '17 15:08 arina-ielchiieva

@arina-ielchiieva other tests have passed separately and time zone related test cases have been corrected in PR 5717

weijietong avatar Aug 20 '17 12:08 weijietong

Thanks @weijietong. LGTM. @amansinha100 could you please do the final CR since you are familiar with this issue?

arina-ielchiieva avatar Aug 21 '17 13:08 arina-ielchiieva

@arina-ielchiieva thanks for the advice, have corrected that.

weijietong avatar Aug 23 '17 12:08 weijietong

+1, LGTM.

arina-ielchiieva avatar Aug 23 '17 12:08 arina-ielchiieva

@arina-ielchiieva @amansinha100 isScalarSubquery method has been refactored , please review.

weijietong avatar Sep 06 '17 13:09 weijietong

@arina-ielchiieva, looks like Weijie added a couple of commits since your +1. Can you take a look at them?

paul-rogers avatar Sep 09 '17 22:09 paul-rogers

@arina-ielchiieva @amansinha100 any further advice ?

weijietong avatar Sep 15 '17 09:09 weijietong

@arina-ielchiieva, @amansinha100 can you take another look at this one and see if your concerns have been addressed?

paul-rogers avatar Oct 16 '17 18:10 paul-rogers

@amansinha100 I have removed the override of getMaxRowCount method of TableScan type to avoid your worry about unexpected results . This PR's target adjusts to just enhance the current cartesian join by leveraging Calcite's RelMdMaxRowCount method. I will solve our case specially at our env.

weijietong avatar Oct 29 '17 01:10 weijietong

ping @arina-ielchiieva @amansinha100

weijietong avatar Nov 03 '17 23:11 weijietong

@weijietong I ran the functional tests with the PR and saw some failures. While debugging those failures, I found the simple scalar aggregate case works but anything with compound aggregate expressions throws a CannotPlanException, for example:

explain plan for select count(*) from cp.tpch/lineitem.parquet l where l_quantity < (select max(l2.l_quantity) + 1 from cp.tpch/lineitem.parquet l2)

Here, the right side of the inequality join is actually a scalar but the JoinUtils.isScalar() method does not detect it correctly. The reason is the right side is a DrillProjectRel whose input is a RelSubSet. The RelSubSet has a DrillAggregateRel but it looks like currently Calcite does not detect this is a scalar aggregate if it occurs as part of a RelSubSet. I see Calcite has a comment in [1] which points to an open JIRA CALCITE-1048.

[1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdMaxRowCount.java#L165

amansinha100 avatar Nov 08 '17 19:11 amansinha100

@weijietong I am interested in getting your basic changes in. It is unfortunate we are running into this issue with RelSubSet. Let me see if I can make some changes on top of your changes (I will keep your authorship) and have it pass all our functional regression tests.

amansinha100 avatar Nov 08 '17 19:11 amansinha100

@amansinha100 thanks for sharing the information. Got your point. I think your propose on CALCITE-1048 is possible. Since CALCITE-794 has completed at version 1.6 ,it seems there's a more perfect solution( to get the least max number of all the rels of the RelSubSet). But due to Drill's Caclite version is still based on 1.4 , I support your current temp solution. Only wonder that whether the explicitly searched RelNode's (such as DrillAggregateRel) maxRowCount can represent the best RelNode's maxRowCount ?

weijietong avatar Nov 10 '17 05:11 weijietong