hbase icon indicating copy to clipboard operation
hbase copied to clipboard

HBASE-28195 set start row as prefix if a scan with PrefixFilter

Open frostruan opened this issue 2 years ago • 7 comments

This PR introduces a ScanRangeOptimizer to try to reduce unnecessary reading of data based on filters user set.

For example, if user want to scan data where rowkey > 'hhh' and rowkey < 'mmm', the optimizer can optimize start row to 'hhh' and stop row to 'mmm'. Compare to the default start row and stop row, EMPTY_START_ROW and EMPTY_STOP_ROW, this will help speed up scan request.

frostruan avatar Nov 12 '23 13:11 frostruan

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 2m 23s Docker mode activated.
-0 :warning: yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 :ok: mvndep 0m 12s Maven dependency ordering for branch
+1 :green_heart: mvninstall 2m 30s master passed
+1 :green_heart: compile 0m 31s master passed
+1 :green_heart: shadedjars 5m 14s branch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 25s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 12s Maven dependency ordering for patch
+1 :green_heart: mvninstall 2m 18s the patch passed
+1 :green_heart: compile 0m 29s the patch passed
+1 :green_heart: javac 0m 29s the patch passed
+1 :green_heart: shadedjars 5m 10s patch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 24s the patch passed
_ Other Tests _
+1 :green_heart: unit 1m 49s hbase-common in the patch passed.
-1 :x: unit 1m 3s hbase-client in the patch failed.
24m 3s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5514
Optional Tests javac javadoc unit shadedjars compile
uname Linux dd7ad45a760d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e806350bd0
Default Java Temurin-1.8.0_352-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-client.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/testReport/
Max. process+thread count 360 (vs. ulimit of 30000)
modules C: hbase-common hbase-client U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 12 '23 13:11 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 34s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 :ok: mvndep 0m 12s Maven dependency ordering for branch
+1 :green_heart: mvninstall 3m 54s master passed
+1 :green_heart: compile 0m 47s master passed
+1 :green_heart: shadedjars 6m 15s branch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 38s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 12s Maven dependency ordering for patch
+1 :green_heart: mvninstall 3m 19s the patch passed
+1 :green_heart: compile 0m 40s the patch passed
+1 :green_heart: javac 0m 40s the patch passed
+1 :green_heart: shadedjars 6m 0s patch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 40s the patch passed
_ Other Tests _
+1 :green_heart: unit 2m 50s hbase-common in the patch passed.
-1 :x: unit 1m 27s hbase-client in the patch failed.
28m 53s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5514
Optional Tests javac javadoc unit shadedjars compile
uname Linux cb796ef6263c 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e806350bd0
Default Java Eclipse Adoptium-11.0.17+8
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-client.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/testReport/
Max. process+thread count 370 (vs. ulimit of 30000)
modules C: hbase-common hbase-client U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 12 '23 13:11 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 1m 18s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 :ok: mvndep 0m 10s Maven dependency ordering for branch
+1 :green_heart: mvninstall 2m 51s master passed
+1 :green_heart: compile 1m 13s master passed
+1 :green_heart: checkstyle 0m 32s master passed
+1 :green_heart: spotless 0m 43s branch has no errors when running spotless:check.
+1 :green_heart: spotbugs 1m 16s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 12s Maven dependency ordering for patch
+1 :green_heart: mvninstall 2m 39s the patch passed
+1 :green_heart: compile 1m 12s the patch passed
+1 :green_heart: javac 1m 12s the patch passed
-0 :warning: checkstyle 0m 16s hbase-client: The patch generated 3 new + 4 unchanged - 0 fixed = 7 total (was 4)
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: hadoopcheck 9m 16s Patch does not cause any errors with Hadoop 3.2.4 3.3.6.
-1 :x: spotless 0m 18s patch has 66 errors when running spotless:check, run spotless:apply to fix.
+1 :green_heart: spotbugs 1m 31s the patch passed
_ Other Tests _
+1 :green_heart: asflicense 0m 19s The patch does not generate ASF License warnings.
29m 46s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5514
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux c61adbdc9a94 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e806350bd0
Default Java Eclipse Adoptium-11.0.17+8
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-client.txt
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 78 (vs. ulimit of 30000)
modules C: hbase-common hbase-client U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5514/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 12 '23 13:11 Apache-HBase

There is a setStartStopRowForPrefixScan method for Scan already? I think it is exactly for the same purpose...

Apache9 avatar Nov 13 '23 03:11 Apache9

Thanks for reviewing Duo. Yes, the setStartStopRowForPrefixScan method works for prefix filtering, but it can not work for range filtering. Maybe the title misunderstood you. What I want to introduce here, is like the query optimizer sub-system in RDBMS. It will optimize the scan range based on the filters that user sets. For example, if user want to scan data where rowkey > 'hhh' and rowkey < 'mmm', the optimizer can optimize start row to 'hhh' and stop row to 'mmm'. Compare to the default start row and stop row, EMPTY_START_ROW and EMPTY_STOP_ROW, this will help speed up scan request.

frostruan avatar Nov 13 '23 05:11 frostruan

Then let's change the title and post a simple design doc to discuss first? I think introducing a new mechanism is fine, but we need to discuss it first. At least, changing the Scan object passed in may break our users code...

Apache9 avatar Nov 25 '23 04:11 Apache9

OK. Thanks for your advise Duo. Let me prepare the design doc first.

frostruan avatar Nov 25 '23 13:11 frostruan