HBASE-28584 RS SIGSEGV under heavy replication load
Clone the cells that are used to apply mutations on the local cluster. Some operations may still be in flight even as we fail to apply some other in-flight mutations and trigger failure handling including a release of the buffer underlying the cellScanner that is sourcing the cells.
See also https://issues.apache.org/jira/browse/HBASE-28584?focusedCommentId=17869030&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17869030
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 41s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| +1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. | |
| _ master Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 17s | Maven dependency ordering for branch | |
| +1 :green_heart: | mvninstall | 4m 22s | master passed | |
| +1 :green_heart: | compile | 4m 30s | master passed | |
| +1 :green_heart: | checkstyle | 1m 8s | master passed | |
| +1 :green_heart: | spotbugs | 2m 42s | master passed | |
| +1 :green_heart: | spotless | 0m 59s | branch has no errors when running spotless:check. | |
| _ Patch Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 11s | Maven dependency ordering for patch | |
| +1 :green_heart: | mvninstall | 3m 36s | the patch passed | |
| +1 :green_heart: | compile | 4m 16s | the patch passed | |
| +1 :green_heart: | javac | 4m 16s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| +1 :green_heart: | checkstyle | 1m 4s | the patch passed | |
| +1 :green_heart: | spotbugs | 2m 46s | the patch passed | |
| +1 :green_heart: | hadoopcheck | 12m 56s | Patch does not cause any errors with Hadoop 3.3.6 3.4.0. | |
| +1 :green_heart: | spotless | 0m 45s | patch has no errors when running spotless:check. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | asflicense | 0m 21s | The patch does not generate ASF License warnings. | |
| 48m 41s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/artifact/yetus-general-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/6124 |
| Optional Tests | dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless |
| uname | Linux 30055f172d95 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / e0a31621f67dec9c6da8a0607e9bdb28783afca5 |
| Default Java | Eclipse Adoptium-17.0.11+9 |
| Max. process+thread count | 84 (vs. ulimit of 30000) |
| modules | C: hbase-common hbase-server U: . |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/console |
| versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
| Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
This message was automatically generated.
OK, in ServerCall class, we have a retainByWAL method, where we will count the extra references of the ServerCall, mainly the CellScanners.
I think we can just change the method to retain, which means we want to retain it for other usage even after the rpc call is done, and also use it here.
Anyway, since the current PR has been well tested in producation, I think we can apply it first, and open another issue for optimizing.
Thanks.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 4m 11s | Docker mode activated. | |
| -0 :warning: | yetus | 0m 3s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck | |
| _ Prechecks _ | ||||
| _ master Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 9s | Maven dependency ordering for branch | |
| +1 :green_heart: | mvninstall | 4m 33s | master passed | |
| +1 :green_heart: | compile | 1m 43s | master passed | |
| +1 :green_heart: | javadoc | 1m 15s | master passed | |
| +1 :green_heart: | shadedjars | 6m 12s | branch has no errors when building our shaded downstream artifacts. | |
| _ Patch Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 13s | Maven dependency ordering for patch | |
| +1 :green_heart: | mvninstall | 3m 0s | the patch passed | |
| +1 :green_heart: | compile | 1m 21s | the patch passed | |
| +1 :green_heart: | javac | 1m 21s | the patch passed | |
| +1 :green_heart: | javadoc | 0m 47s | the patch passed | |
| +1 :green_heart: | shadedjars | 5m 20s | patch has no errors when building our shaded downstream artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 2m 29s | hbase-common in the patch passed. | |
| +1 :green_heart: | unit | 229m 24s | hbase-server in the patch passed. | |
| 265m 42s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/6124 |
| Optional Tests | javac javadoc unit compile shadedjars |
| uname | Linux 265f2a29681d 5.4.0-177-generic #197-Ubuntu SMP Thu Mar 28 22:45:47 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / e0a31621f67dec9c6da8a0607e9bdb28783afca5 |
| Default Java | Eclipse Adoptium-17.0.11+9 |
| Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/testReport/ |
| Max. process+thread count | 5172 (vs. ulimit of 30000) |
| modules | C: hbase-common hbase-server U: . |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6124/1/console |
| versions | git=2.34.1 maven=3.9.8 |
| Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
This message was automatically generated.
Any updates here?
Thanks.
I think we should fix this in newer releases.
If you are all OK, I could try to implement the reference counting way to solve the problem.
@apurtell @virajjasani Thoughts?
Thanks.
I thought refcounting would be complex but am not opposed to it as a different solution. When and if we have that we could remove the copying.
We would not need this change if https://github.com/apache/hbase/pull/6263 solves the problem instead.
Fixed by https://github.com/apache/hbase/pull/6263