hadoop icon indicating copy to clipboard operation
hadoop copied to clipboard

HDFS-16659. JournalNode should throw CacheMissException when SinceTxId is bigger than HighestWrittenTxId

Open ZanderXu opened this issue 3 years ago • 5 comments
trafficstars

Description of PR

JournalNode should throw CacheMissException if sinceTxId is bigger than highestWrittenTxId during handling getJournaledEdits rpc from NNs. Current logic may cause in-progress EditlogTailer cannot replay any Edits from JournalNodes in some corner cases, resulting in ObserverNameNode cannot handle requests from clients.

Suppose there are 3 journalNodes, JN0 ~ JN1.

  • JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with first txid 11
  • NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 1 and 2
  • JN0 backed to health
  • NameNode continue sync 10 Edits with first txid 21.
  • At this point, there are no Edits 11 ~ 30 in the cache of JN0
  • Observer NameNode try to select EditLogInputStream through getJournaledEdits with since txId 21
  • Journal 2 has some abnormal cases and caused a slow response

The expected result is: Response should contain 20 Edits from txId 21 to txId 30 from JN1 and JN2. Because Active NameNode successfully write these Edits to JN1 and JN2 and failed write these edits to JN0.

But in the current implementation, the response is [Response(0) from JN0, Response(10) from JN1], because there are some abnormal cases in JN2, such as GC, bad network, cause a slow response. So the maxAllowedTxns will be 0, NameNode will not replay any Edits.

As above, the root case is that JournalNode should throw Miss Cache Exception when sinceTxid is more than highestWrittenTxId.

And the bug code as blew:

if (sinceTxId > getHighestWrittenTxId()) {
    // Requested edits that don't exist yet; short-circuit the cache here
    metrics.rpcEmptyResponses.incr();
    return GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
}

ZanderXu avatar Jul 13 '22 14:07 ZanderXu

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 37m 41s trunk passed
+1 :green_heart: compile 1m 41s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: compile 1m 37s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 25s trunk passed
+1 :green_heart: mvnsite 1m 44s trunk passed
+1 :green_heart: javadoc 1m 23s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javadoc 1m 45s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 41s trunk passed
+1 :green_heart: shadedclient 22m 53s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 24s the patch passed
+1 :green_heart: compile 1m 27s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javac 1m 27s the patch passed
+1 :green_heart: compile 1m 22s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 22s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 1s the patch passed
+1 :green_heart: mvnsite 1m 29s the patch passed
+1 :green_heart: javadoc 0m 58s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javadoc 1m 30s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 25s the patch passed
+1 :green_heart: shadedclient 22m 33s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 241m 50s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 3s The patch does not generate ASF License warnings.
350m 48s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4560
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d1165fe0bbf9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d637d4bf1a303dcacf34ea4f276aac337ce416d7
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/1/testReport/
Max. process+thread count 3347 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Jul 13 '22 20:07 hadoop-yetus

@tomscut Are you interesting to review this bug about selecting EditLogInputStreams?

ZanderXu avatar Jul 14 '22 04:07 ZanderXu

@jojochuang @goiri Can you help me review this patch? Thanks

ZanderXu avatar Jul 20 '22 16:07 ZanderXu

@jojochuang @Hexiaoqiao Can you help me review this bug?

It will cause Observer Namenode cannot handle requests from Client in corner case. We encountered this bug in our prod environment, because our JournalNode cluster is deployed cross-dc to support cross-dc disaster recovery.

ZanderXu avatar Jul 27 '22 05:07 ZanderXu

@xkrogen Master, can you help me review this patch?

ZanderXu avatar Jul 28 '22 03:07 ZanderXu

Thanks for trying to tackle this issue! Actually @shvachko and I discussed this potential issue long ago but had not observed problems in practice; I guess it is made much worse by using cross-DC JNs.

I don't feel that a CacheMissException is correct. The situation where the NN requests edits newer than what the JNs have is expected to be common, especially if the transaction rate is low, since in this situation the NN will constantly poll the JNs for new edits by sending sinceTxID = highestWrittenTxId + 1. I see you're trying to handle this by special-casing when the sinceTxId is getHighestWrittenTxId() + 1, but it seems pretty hacky/brittle.

My initial thought is that we should make a special-case return value when sinceTxId > highestWrittenTxId (maybe -1) and on the NN side, if you find some responses with txnCount > 0 and some responses with txnCount < 0, then you only use the responses with txnCount > 0. The main issue I see with this is that AsyncLoggerSet#waitForWriteQuorum() isn't set up to handle this kind of situation; it will just return as soon as there are a quorum of non-error responses.

As an alternative, we could create a new exception different from CacheMissException, like NewerTxnIdException, which the JN throws in the situation of startTxId > highestWrittenTxId. Since it's an exception, waitForWriteQuorum() will try to throw away JNs that threw it. If only some JNs throw the exception, then we still get a valid result from waitForWriteQuorum(). If too many JNs throw the exception, then we can catch it on the NN side and swallow the exception to treat it as a normal/expected situation. I think this would avoid us having to special-case startTxId + 1 on the JN side.

WDYT?

xkrogen avatar Aug 18 '22 22:08 xkrogen

we could create a new exception different from CacheMissException, like NewerTxnIdException, which the JN throws in the situation of startTxId > highestWrittenTxId

@xkrogen Master, thanks for your nice suggestion. About startTxId = highestWrittenTxId + 1, this case is common, so it should returns one JournaledEditsResponse with txtCount=0. if sinceTxId > highestTxId + 1, just throw NewerTxnIdException and let namenode ignore this abnormal journalnode.

I have updated this patch, please help me review this patch. Thanks, Master!

ZanderXu avatar Aug 19 '22 08:08 ZanderXu

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 39m 32s trunk passed
+1 :green_heart: compile 1m 44s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: compile 1m 39s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 20s trunk passed
+1 :green_heart: mvnsite 1m 50s trunk passed
+1 :green_heart: javadoc 1m 22s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javadoc 1m 45s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 40s trunk passed
+1 :green_heart: shadedclient 23m 12s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 21s the patch passed
+1 :green_heart: compile 1m 26s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javac 1m 26s the patch passed
+1 :green_heart: compile 1m 21s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 21s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 2s the patch passed
+1 :green_heart: mvnsite 1m 26s the patch passed
+1 :green_heart: javadoc 0m 58s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javadoc 1m 36s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 27s the patch passed
+1 :green_heart: shadedclient 22m 50s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 240m 36s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 6s The patch does not generate ASF License warnings.
352m 14s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/2/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4560
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux c6d0193ce71b 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 863fa6450301908b4caffd36d0d692180de6169f
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/2/testReport/
Max. process+thread count 3511 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Aug 19 '22 14:08 hadoop-yetus

What's the concern with throwing the exception in in the highestTxId + 1

@xkrogen Master, Maybe our understanding of sinceTxId == highestTxId + 1 is a bit ambiguous. Please correct me, if I'm wrong.

  • sinceTxId == highestTxId + 1 is normal case, especially if the transaction rate is low. So JournalNode should return one empty GetJournaledEditsResponseProto to NameNode, not throw NewerTxnIdException.
  • Conversely, if Journal throws a NewerTxnIdException to namenode, namenode will fail back to selectStreamingInputStreams with getEditLogManifest with this sinceTxId. Because there is no new edits in JournalNodes, so selectStreamingInputStreams will get an empty response too.

I try to guess that you mean to use nextTxId? If we use nextTxId, maybe we can change this code as bellow:

if (sinceTxId > nextTxId) {
  throw new JournaledEditsCache.NewerTxnIdException(...);
}

Because sinceTxId == nextTxId is a normal case, JournalNode should return an empty GetJournaledEditsResponseProto.

ZanderXu avatar Aug 20 '22 04:08 ZanderXu

I am suggesting that we would also modify QuorumJournalManager#selectInputStreams() like:

      try {
        Collection<EditLogInputStream> rpcStreams = new ArrayList<>();
        selectRpcInputStreams(rpcStreams, fromTxnId, onlyDurableTxns);
        streams.addAll(rpcStreams);
        return;
      } catch (NewerTxnIdException ntie) {
        // normal situation, we requested newer IDs than any journal has. no new streams
        return;
      } catch (IOException ioe) {
        LOG.warn("Encountered exception while tailing edits >= " + fromTxnId +
            " via RPC; falling back to streaming.", ioe);
      }

I say this mainly because we want to use NewerTxnIdException to detect when a JN is lagging, right? But if we special-case sinceTxId == highestTxId + 1, then we might not detect the case where a JN is lagging by one txn.

So let's say we have: JN0 with ID 1, JN1 with ID 2, JN2 with ID 2 (so JN0 lags by one txn). Now we send out getJournaledEdits() RPCs. JN2 happens to respond slow, so we get response from JN0 and JN1. Now it looks like only txn 1 is durably committed and we never load txn 2 -- the same issue you described in your original bug description.

But by throwing NewerTxnIdException, AsyncLoggerSet will instead ignore the response from JN0, so we wait for response from JN1 and JN2, and we correctly see that up to txn 2 is committed durably.

Does this clarify? I agree the situation I describe should be rare, but I feel that we can cleanly solve it by using NewerTxnIdException.

xkrogen avatar Aug 25 '22 00:08 xkrogen

@xkrogen Master, thanks for your detailed explanation and nice suggestion. I will modify this path whit this nice idea.

ZanderXu avatar Aug 25 '22 01:08 ZanderXu

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 44s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 38m 30s trunk passed
+1 :green_heart: compile 1m 42s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: compile 1m 30s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 18s trunk passed
+1 :green_heart: mvnsite 1m 46s trunk passed
+1 :green_heart: javadoc 1m 25s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javadoc 1m 41s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 34s trunk passed
+1 :green_heart: shadedclient 22m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 20s the patch passed
+1 :green_heart: compile 1m 26s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javac 1m 26s the patch passed
+1 :green_heart: compile 1m 20s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 20s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 59s the patch passed
+1 :green_heart: mvnsite 1m 25s the patch passed
+1 :green_heart: javadoc 0m 55s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javadoc 1m 30s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 23s the patch passed
+1 :green_heart: shadedclient 22m 22s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 237m 56s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 6s The patch does not generate ASF License warnings.
347m 4s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/3/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4560
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux b6ef6639ef5c 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 08e0dae7ddf43a22e699a34b21dbe7e32755969c
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/3/testReport/
Max. process+thread count 3832 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Aug 25 '22 08:08 hadoop-yetus

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 37m 56s trunk passed
+1 :green_heart: compile 1m 36s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: compile 1m 30s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 18s trunk passed
+1 :green_heart: mvnsite 1m 45s trunk passed
+1 :green_heart: javadoc 1m 22s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javadoc 1m 36s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 34s trunk passed
+1 :green_heart: shadedclient 23m 11s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 17s the patch passed
+1 :green_heart: compile 1m 28s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javac 1m 28s the patch passed
+1 :green_heart: compile 1m 18s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 18s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 57s the patch passed
+1 :green_heart: mvnsite 1m 19s the patch passed
+1 :green_heart: javadoc 0m 53s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 :green_heart: javadoc 1m 30s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 26s the patch passed
+1 :green_heart: shadedclient 22m 33s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 239m 21s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 3s The patch does not generate ASF License warnings.
347m 47s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/4/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4560
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 0a38b1c3eb4c 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 69e3d9e8f81581c3bf96ed4cf34bc4226a74586a
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/4/testReport/
Max. process+thread count 2936 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Aug 26 '22 09:08 hadoop-yetus

@xkrogen Sir, sorry to ping you again. Please help me review this patch again.

ZanderXu avatar Sep 01 '22 23:09 ZanderXu

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 39m 59s trunk passed
+1 :green_heart: compile 1m 38s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 30s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 18s trunk passed
+1 :green_heart: mvnsite 1m 45s trunk passed
+1 :green_heart: javadoc 1m 16s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 39s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 44s trunk passed
+1 :green_heart: shadedclient 23m 30s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 19s the patch passed
+1 :green_heart: compile 1m 23s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 23s the patch passed
+1 :green_heart: compile 1m 24s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 24s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 0m 58s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 17 unchanged - 0 fixed = 18 total (was 17)
+1 :green_heart: mvnsite 1m 30s the patch passed
+1 :green_heart: javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 31s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 21s the patch passed
+1 :green_heart: shadedclient 22m 25s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 253m 4s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 12s The patch does not generate ASF License warnings.
364m 30s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/5/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4560
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 12111e311603 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / fc9b2d6b18e0571cc4fce42c3d134ff263039b55
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/5/testReport/
Max. process+thread count 2920 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Sep 02 '22 09:09 hadoop-yetus

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 38m 45s trunk passed
+1 :green_heart: compile 1m 34s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 32s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 23s trunk passed
+1 :green_heart: mvnsite 1m 38s trunk passed
+1 :green_heart: javadoc 1m 19s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 43s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 39s trunk passed
+1 :green_heart: shadedclient 23m 6s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 19s the patch passed
+1 :green_heart: compile 1m 24s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 24s the patch passed
+1 :green_heart: compile 1m 16s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 16s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 58s the patch passed
+1 :green_heart: mvnsite 1m 22s the patch passed
+1 :green_heart: javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 29s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 16s the patch passed
+1 :green_heart: shadedclient 22m 28s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 237m 30s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 0s The patch does not generate ASF License warnings.
346m 21s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/6/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4560
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f9bd71d68f1c 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 095e80b0dbfb99371d2489fb37844f847c6cf820
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/6/testReport/
Max. process+thread count 3519 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4560/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Sep 02 '22 16:09 hadoop-yetus

Merged to trunk. Thanks for the contribution @ZanderXu !

xkrogen avatar Sep 06 '22 17:09 xkrogen

@xkrogen Sir, thank you for your patient answers and reviews, as well as nice suggestions.

ZanderXu avatar Sep 07 '22 01:09 ZanderXu

@xkrogen After deep thinking and do some verifications, I found there are two places should be fixed for the case that sinceTxId = highestTxId + 1.

Currently Journal throws one NewerTxnIdException to namenode, we expect namenode can catch NewerTxnIdException during selectRpcInputStreams and ignore it.

But the namenode throws a QuorumException during selectRpcInputStreams because there are a majority of NewerTxnIdException. Then the namenode fallbacks to selectStreamingInputStreams.

Beside this problem, JournalNodeRpcServer shouldn't print any logs about NewerTxnIdException when sinceTxId = highestTxId + 1, but it should print some logs about NewerTxnIdException when sinceTxId > highestTxId + 1.

So as above cases, how about handling them differently? such as

    long highestTxId = getHighestWrittenTxId();
    if (sinceTxId == highestTxId + 1) {
      // This is normal case and will return one response with 0 txnCount.
      metrics.rpcEmptyResponses.incr();
      return GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build();
    } else if (sinceTxId > highestTxId) {
      // Requested edits that don't exist yet and is newer than highestTxId.
      metrics.rpcEmptyResponses.incr();
      throw new NewerTxnIdException(
          "Highest txn ID available in the journal is %d, but requested txns starting at %d.",
          highestTxId, sinceTxId);
    }

ZanderXu avatar Sep 13 '22 10:09 ZanderXu

@xkrogen Sir, I create a new ticket HDFS-16771 to resolve this problems. If you have any good ideas, please share with me. Thanks.

ZanderXu avatar Sep 13 '22 11:09 ZanderXu