hadoop
hadoop copied to clipboard
HDFS-16179. Update loglevel for BlockManager#chooseExcessRedundancySt…
JIRA: HDFS-16179
private void chooseExcessRedundancyStriped(BlockCollection bc,
final Collection<DatanodeStorageInfo> nonExcess,
BlockInfo storedBlock,
DatanodeDescriptor delNodeHint) {
...
// cardinality of found indicates the expected number of internal blocks
final int numOfTarget = found.cardinality();
final BlockStoragePolicy storagePolicy = storagePolicySuite.getPolicy(
bc.getStoragePolicyID());
final List<StorageType> excessTypes = storagePolicy.chooseExcess(
(short) numOfTarget, DatanodeStorageInfo.toStorageTypes(nonExcess));
if (excessTypes.isEmpty()) {
LOG.warn("excess types chosen for block {} among storages {} is empty",
storedBlock, nonExcess);
return;
}
...
}
IMO, here is just detecting excess StorageType and setting the log level to debug has no effect.
We have a cluster that uses the EC policy to store data. The current log level is WARN here, and in about 50 minutes, 286,093 logs are printed, which can cause other important logs to drown out.
LGTM
Thanks @ayushtkn for your review.
:broken_heart: -1 overall
Vote | Subsystem | Runtime | Logfile | Comment |
---|---|---|---|---|
+0 :ok: | reexec | 1m 4s | Docker mode activated. | |
_ Prechecks _ | ||||
+1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
+0 :ok: | codespell | 0m 1s | codespell was not available. | |
+1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
-1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |
_ trunk Compile Tests _ | ||||
+1 :green_heart: | mvninstall | 37m 48s | trunk passed | |
+1 :green_heart: | compile | 1m 25s | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | |
+1 :green_heart: | compile | 1m 15s | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | |
+1 :green_heart: | checkstyle | 1m 0s | trunk passed | |
+1 :green_heart: | mvnsite | 1m 22s | trunk passed | |
+1 :green_heart: | javadoc | 0m 56s | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | |
+1 :green_heart: | javadoc | 1m 27s | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | |
+1 :green_heart: | spotbugs | 3m 19s | trunk passed | |
+1 :green_heart: | shadedclient | 19m 7s | branch has no errors when building and testing our client artifacts. | |
_ Patch Compile Tests _ | ||||
+1 :green_heart: | mvninstall | 1m 15s | the patch passed | |
+1 :green_heart: | compile | 1m 21s | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | |
+1 :green_heart: | javac | 1m 21s | the patch passed | |
+1 :green_heart: | compile | 1m 7s | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | |
+1 :green_heart: | javac | 1m 7s | the patch passed | |
+1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
+1 :green_heart: | checkstyle | 0m 53s | the patch passed | |
+1 :green_heart: | mvnsite | 1m 17s | the patch passed | |
+1 :green_heart: | javadoc | 0m 48s | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | |
+1 :green_heart: | javadoc | 1m 23s | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | |
+1 :green_heart: | spotbugs | 3m 21s | the patch passed | |
+1 :green_heart: | shadedclient | 18m 46s | patch has no errors when building and testing our client artifacts. | |
_ Other Tests _ | ||||
-1 :x: | unit | 354m 15s | /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | hadoop-hdfs in the patch passed. |
+1 :green_heart: | asflicense | 0m 37s | The patch does not generate ASF License warnings. | |
451m 10s |
Reason | Tests |
---|---|
Failed junit tests | hadoop.hdfs.server.datanode.TestBlockScanner |
hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
Subsystem | Report/Notes |
---|---|
Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3313/1/artifact/out/Dockerfile |
GITHUB PR | https://github.com/apache/hadoop/pull/3313 |
Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell |
uname | Linux 0cb16d9f8243 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | dev-support/bin/hadoop.sh |
git revision | trunk / a121470e5b175de09f0697d510e9b26612b2f6e1 |
Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3313/1/testReport/ |
Max. process+thread count | 2110 (vs. ulimit of 5500) |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3313/1/console |
versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
This message was automatically generated.
Hi @jojochuang @ferhui , could you please also review the code? Thanks a lot.
@tomscut Thanks for contribution. I'm confused here. The log is necessary or not. If it's necessary, many logs as you said drown out other logs. If it's not necessary, I think DEBUG level is OK.
@tomscut Thanks for contribution. I'm confused here. The log is necessary or not. If it's necessary, many logs as you said drown out other logs. If it's not necessary, I think DEBUG level is OK.
Thanks @ferhui for your review. I think this method is just detecting redundancies and deleting them, but there are no redundancies in most cases. IMO, the log is not necessary here.
@tomscut Thanks for comments.
but there are no redundancies in most cases
I see that if shouldProcessExtraRedundancy returns true, It will enter this method. If no redundancies, it will not enter this method, is it right? Are there any reasons that it prints lots of logs, maybe there is a hidden reason ?
@tomscut Thanks for comments.
but there are no redundancies in most cases
I see that if shouldProcessExtraRedundancy returns true, It will enter this method. If no redundancies, it will not enter this method, is it right? Are there any reasons that it prints lots of logs, maybe there is a hidden reason ?
Thanks @ferhui for your comments and careful consideration. I found this issue HDFS-9876, and we can refer to it.
Hi @Jing9 , could you please take a look at this. Thanks.
There are no so many logs you mentioned in our cluster of thousands of nodes. So I guess there is a hidden reason behind of it. If we change it to DEBUG level, maybe we neglect some information.
There are no so many logs you mentioned in our cluster of thousands of nodes. So I guess there is a hidden reason behind of it. If we change it to DEBUG level, maybe we neglect some information.
Do you also use EC policy? We had a cold data cluster of 350 nodes, all directories are using EC policy, and we ran into this problem.
Yes, EC policy is used.
Yes, EC policy is used.
Thank you very much for your quick reply. There may be other reasons, we will take a closer look.
@tomscut Hi, does this WARN log be printed when only writing EC files ? This WARN logs also appeared in our cluster without writting any files, but not as many as you said. I found that the block in the WARN log belongs to the file written a long time ago. So, I have some guesses:
- is there a daemon thread calling this method?
- or other conditions trigger this method?
Here is our 3-hour running log.
@tomscut Hi, does this WARN log be printed when only writing EC files ? This WARN logs also appeared in our cluster without writting any files, but not as many as you said. I found that the block in the WARN log belongs to the file written a long time ago. So, I have some guesses:
- is there a daemon thread calling this method?
- or other conditions trigger this method?
Here is our 3-hour running log.
Thanks @whbing for your comments. I found those logs were printed after completeFile. Triggered by FSDirWriteFileOp#completeFileInternal(). Our hadoop version is 3.1.0.
private static boolean completeFileInternal(
FSNamesystem fsn, INodesInPath iip,
String holder, Block last, long fileId)
throws IOException {
(...)
fsn.finalizeINodeFileUnderConstruction(src, pendingFile,
Snapshot.CURRENT_STATE_ID, true);
(...)
return true;
}
@tomscut Hi, does this WARN log be printed when only writing EC files ? This WARN logs also appeared in our cluster without writting any files, but not as many as you said. I found that the block in the WARN log belongs to the file written a long time ago. So, I have some guesses:
- is there a daemon thread calling this method?
- or other conditions trigger this method?
Here is our 3-hour running log.
Thanks @whbing for your comments. I found those logs were printed after completeFile. Triggered by FSDirWriteFileOp#completeFileInternal().
private static boolean completeFileInternal( FSNamesystem fsn, INodesInPath iip, String holder, Block last, long fileId) throws IOException { (...) fsn.finalizeINodeFileUnderConstruction(src, pendingFile, Snapshot.CURRENT_STATE_ID, true); (...) return true; }
@tomscut Hi, does this WARN log be printed when only writing EC files ? This WARN logs also appeared in our cluster without writting any files, but not as many as you said. I found that the block in the WARN log belongs to the file written a long time ago. So, I have some guesses:
- is there a daemon thread calling this method?
- or other conditions trigger this method?
Here is our 3-hour running log.
Yes, those WARN logs were printed only writing EC files. Because the logs printed in BlockManager#chooseExcessRedundancyStriped().
@ferhui @whbing I found if balancer is started, the log will be printed. If balancer is stopped, the log will not be printed.
@ferhui @whbing I found if balancer is started, the log will be printed. If balancer is stopped, the log will not be printed.
@tomscut Hi, Do you find the root cause? We encountered the same issue after balancer started.