ozone
ozone copied to clipboard
HDDS-5455. Check invalid keyName when creating new key
What changes were proposed in this pull request?
In OzoneManagerStateMachine.runCommand, if there is a Runtime exception, OM will be terminated.
In our case, the user would like to create an object with malformed characters in keyName, and the following error happened and throws an InvalidPathException, which is a Runtime Exception causing the OM's shutdown.
Normally this wouldn't happen, since a user can not easily create an invalid keyName with UTF-8. It happened in our environment because our OM started with the property of "file.encoding" and "sun.jnu.encoding" set to "ANSI_X3.4-1968", and it's easy for the user to write a invalid keyName.
This issue can be avoided when running in a good environment, but it's still not reasonable that a client create key operation causing the shutdown of OM.
This ticket is to check the keyName if it fits the current encoding of OM, if not just return the OMException of "INVALID_KEY_NAME".
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5455
How was this patch tested?
unit test
@elek @adoroszlai Could you help to check this PR?
@adoroszlai Thanks for the review.
Seems the config of "OZONE_OM_KEYNAME_CHARACTER_CHECK_ENABLED_KEY" will deny
* "\", "{", "}", "^", "<", ">", "#", "|", "%", "`", "[", "]", "~", "?"
* and Non-printable ASCII characters (128–255 decimal characters).
, but if the user does create a name contains something like "U-00000000", it won't help with the issue.
I think it is the catastrophic outcome that is not acceptable to users. In our scenario, SRE provides the basic env and we install the applications, we are not quite sure if SRE has set the correcting encoding. So we start the Ozone cluster, then it runs well for several months, suddenly the OM crashes because users are creating some invalid keys.
Besides, the keys are created by the scheduling system, the invalid key was created several hours after we told them to correct them, which means if we don't check the "Paths.get", the OM cluster isn't available for several hours, and it's not even decided by cluster maintainers, but the users.
It took some time before I found out it's related to the encoding. When the outage happens, and the error message is so obvious, I won't even check the env, so the recovery still relies on the users.
IMHO, the encoding of JVM should affect Ozone so much. If Ozone runs on "ANSI_X3.4-1968", it should provide the functions on this encoding, and if this encoding can not resolve a keyName, it should just return an INVALID_KEY_NAME and reject. If runs on "UTF-8", it should reject the user requests as well if it can not resolve.
Attached the original error log and keyName. The original error is as follows:
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /vol_name/bucket_name/task-exec/di_scheduler_test.monhly_dudutestshell063002??prod_2021071519_HOUR_1_0_prod/main.sh
at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
at java.nio.file.Paths.get(Paths.java:84)
at org.apache.hadoop.ozone.util.RadixTree.getLongestPrefix(RadixTree.java:190)
at org.apache.hadoop.ozone.om.PrefixManagerImpl.getLongestPrefixPath(PrefixManagerImpl.java:260)
at org.apache.hadoop.ozone.om.request.key.OMKeyRequest.getAclsForKey(OMKeyRequest.java:284)
at org.apache.hadoop.ozone.om.request.key.OMKeyRequest.createKeyInfo(OMKeyRequest.java:267)
at org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareKeyInfo(OMKeyRequest.java:352)
at org.apache.hadoop.ozone.om.request.key.OMKeyCreateRequest.validateAndUpdateCache(OMKeyCreateRequest.java:278)
at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:227)
at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:417)
at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$applyTransaction$1(OzoneManagerStateMachine.java:242)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The keyName user wants to create is task-exec/di_scheduler_test.monhly_dudutestshell063002\342\200\224\342\200\224prod_2021071519_HOUR_1_0_prod/main.sh
.
I think it is the catastrophic outcome that is not acceptable to users.
I agree and would like to get this fixed. I'm just wondering about the best way. I would prefer least intrusion for users, even if it requires more work now code-wise. For example we might want to replace existing Paths.get
usage with some other implementation insensitive to file.encoding
. If I understand correctly it is mainly used as a convenient way to split key names at /
, not for actually creating filesystem objects.
Seems Paths.get()
has been used in many places other than RadixTree.getLongestPrefix()
.
Seems
Paths.get()
has been used in many places other thanRadixTree.getLongestPrefix()
.
I guess we should fix those places, too.
@adoroszlai Thanks for revisiting this PR. I was wondering that this ticket is more than file encoding. As the test case shows, it can be triggered deliberately, a malicious user could be able to crash the system with invalid keyname.
@adoroszlai it can be triggered deliberately, a malicious user could be able to crash the system with invalid keyname.
Understood. That's why we should fix it. Just not by rejecting requests.
If there is a invalid character, it will always throw InvalidPathException if we use methods related to "Path". So are you suggesting that we should remove all the use of "Path", and simply use String to do the work? If so this could be a big change, since many interfaces are using "Path", many compatible issues might happen during the change.
So are you suggesting that we should remove all the use of "Path", and simply use String to do the work?
Exactly. Only for volume/bucket/key name parsing, we can keep Path for real file-related operations.
Sorry for the late reply. I tried to replace all the "Paths.get" with "String.split" and array oprerations, but I found some special cases with "~", ".", "..", which needs the help from "Paths", I'm not sure if it's the best approach to do the replacement here.
I found some special cases with "~", ".", "..", which needs the help from "Paths"
Why do we need any of those for volume/bucket/key names?
I think it's normal when users are using OFS protocol.
@symious does this PR need to be revisited?
@kerneltime Sure, thank you for the review.
I updated the PR, because in our cluster, we found invalidPathException not only in CreateKeyRequest, it also emerges in other requests like CreateFileRequset, DeleteKeyRequset.
The trick thing is the path from Protobuf message seems normal, but sometimes it still throws the exception, and it can not be reproduced, it seems happen only once every two days in our UAT cluster.
@symious Please try to avoid force-push when updating the PR. Here are some great articles that explain why:
https://developers.mattermost.com/blog/submitting-great-prs/#4-avoid-force-pushing https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review
@adoroszlai Understand, thank you for the reminder.