hadoop
hadoop copied to clipboard
HDFS-16793. [SBN read] ObserverNN failed to select streaming inputStream from JournalNode
Description of PR
In out prod environment, we encountered one case that observer namenode failed to select streaming inputStream with a timeout exception. And the related code as bellow:
@Override
public void selectInputStreams(Collection<EditLogInputStream> estreams,
long fromTxnId, boolean inProgressOk,
boolean onlyDurableTxns) throws IOException {
if (inProgressOk && inProgressTailingEnabled) {
...
}
// Timeout here.
selectStreamingInputStreams(streams, fromTxnId, inProgressOk,
onlyDurableTxns);
}
After looked into the code and found that JournalNode contains one very expensive and redundant operation that scan all of edits of the last in-progress segment with IO. The related code as bellow:
public List<RemoteEditLog> getRemoteEditLogs(long firstTxId,
boolean inProgressOk) throws IOException {
File currentDir = sd.getCurrentDir();
List<EditLogFile> allLogFiles = matchEditLogs(currentDir);
List<RemoteEditLog> ret = Lists.newArrayListWithCapacity(
allLogFiles.size());
for (EditLogFile elf : allLogFiles) {
if (elf.hasCorruptHeader() || (!inProgressOk && elf.isInProgress())) {
continue;
}
// Here.
if (elf.isInProgress()) {
try {
elf.scanLog(getLastReadableTxId(), true);
} catch (IOException e) {
LOG.error("got IOException while trying to validate header of " +
elf + ". Skipping.", e);
continue;
}
}
if (elf.getFirstTxId() >= firstTxId) {
ret.add(new RemoteEditLog(elf.firstTxId, elf.lastTxId,
elf.isInProgress()));
} else if (elf.getFirstTxId() < firstTxId && firstTxId <= elf.getLastTxId()) {
// If the firstTxId is in the middle of an edit log segment. Return this
// anyway and let the caller figure out whether it wants to use it.
ret.add(new RemoteEditLog(elf.firstTxId, elf.lastTxId,
elf.isInProgress()));
}
}
Collections.sort(ret);
return ret;
}
Expensive:
- This scan operation will scan all of the edits of the in-progress segment with IO.
Redundant:
- This scan operation just find the lastTxId of this in-progress segment
- But the caller method getEditLogManifest(long sinceTxId, boolean inProgressOk) in Journal.java just ignore the lastTxId of the in-progress segment and use getHighestWrittenTxId() as the lastTxId of the in-progress and return to namenode.
- So, the scan operation is redundant.
If end user enable the Observer Read feature, the delay of the tailing edits from journalnode is very important, whether it is normal process or fallback process.
And there is no more comments about this scan logic after looked into the code and HDFS-6634 which added this logic.
The only effect I can get is to scan the in-progress segment for corruption. But namenode can handle the corrupted in-progress segment.
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 27m 52s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| -1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |
| _ trunk Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 44m 9s | trunk passed | |
| +1 :green_heart: | compile | 1m 43s | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | compile | 1m 40s | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | |
| +1 :green_heart: | checkstyle | 1m 20s | trunk passed | |
| +1 :green_heart: | mvnsite | 1m 43s | trunk passed | |
| +1 :green_heart: | javadoc | 1m 20s | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javadoc | 1m 41s | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | |
| +1 :green_heart: | spotbugs | 3m 47s | trunk passed | |
| +1 :green_heart: | shadedclient | 26m 28s | branch has no errors when building and testing our client artifacts. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 1m 30s | the patch passed | |
| +1 :green_heart: | compile | 1m 33s | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javac | 1m 33s | the patch passed | |
| +1 :green_heart: | compile | 1m 24s | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | |
| +1 :green_heart: | javac | 1m 24s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| +1 :green_heart: | checkstyle | 1m 2s | the patch passed | |
| +1 :green_heart: | mvnsite | 1m 30s | the patch passed | |
| +1 :green_heart: | javadoc | 1m 0s | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javadoc | 1m 40s | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | |
| +1 :green_heart: | spotbugs | 3m 44s | the patch passed | |
| +1 :green_heart: | shadedclient | 27m 29s | patch has no errors when building and testing our client artifacts. | |
| _ Other Tests _ | ||||
| -1 :x: | unit | 355m 59s | /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | hadoop-hdfs in the patch passed. |
| +1 :green_heart: | asflicense | 0m 57s | The patch does not generate ASF License warnings. | |
| 507m 6s |
| Reason | Tests |
|---|---|
| Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
| hadoop.hdfs.server.mover.TestMover |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4971/1/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/hadoop/pull/4971 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux 94ea6c8de82b 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / dade7665fe054a45f14cf2966f3f0f8bd09dcaee |
| Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4971/1/testReport/ |
| Max. process+thread count | 2308 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
| Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4971/1/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
Thanks @ZanderXu for reporting this. Changes makes sense. Can you look at the test failure as well? Thanks.
Thanks @ZanderXu for reporting this. Changes makes sense. Can you look at the test failure as well? Thanks.
@ashutoshcipher Thanks for your review and remainder. I have fix the failed UT testWithKerberizedCluster. The another failed UT TestMover works well locally and does not related to this patch.
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 1m 35s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| -1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |
| _ trunk Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 42m 42s | trunk passed | |
| +1 :green_heart: | compile | 1m 57s | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | compile | 1m 33s | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | |
| +1 :green_heart: | checkstyle | 1m 20s | trunk passed | |
| +1 :green_heart: | mvnsite | 1m 48s | trunk passed | |
| +1 :green_heart: | javadoc | 1m 27s | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javadoc | 1m 38s | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | |
| +1 :green_heart: | spotbugs | 3m 58s | trunk passed | |
| +1 :green_heart: | shadedclient | 28m 47s | branch has no errors when building and testing our client artifacts. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 1m 39s | the patch passed | |
| +1 :green_heart: | compile | 1m 41s | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javac | 1m 41s | the patch passed | |
| +1 :green_heart: | compile | 1m 28s | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | |
| +1 :green_heart: | javac | 1m 28s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| +1 :green_heart: | checkstyle | 1m 6s | the patch passed | |
| +1 :green_heart: | mvnsite | 1m 38s | the patch passed | |
| +1 :green_heart: | javadoc | 1m 2s | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javadoc | 1m 32s | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | |
| +1 :green_heart: | spotbugs | 4m 3s | the patch passed | |
| +1 :green_heart: | shadedclient | 27m 45s | patch has no errors when building and testing our client artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 364m 36s | hadoop-hdfs in the patch passed. | |
| +1 :green_heart: | asflicense | 1m 10s | The patch does not generate ASF License warnings. | |
| 491m 20s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4971/2/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/hadoop/pull/4971 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux 84ac63d4e941 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 5f1de83d50d8245b92ebdbeb2dd20a5d462286c1 |
| Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4971/2/testReport/ |
| Max. process+thread count | 2088 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
| Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4971/2/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.