ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-5455. Check invalid keyName when creating new key

Open symious opened this issue 3 years ago • 12 comments

What changes were proposed in this pull request?

In OzoneManagerStateMachine.runCommand, if there is a Runtime exception, OM will be terminated.

In our case, the user would like to create an object with malformed characters in keyName, and the following error happened and throws an InvalidPathException, which is a Runtime Exception causing the OM's shutdown.

Normally this wouldn't happen, since a user can not easily create an invalid keyName with UTF-8. It happened in our environment because our OM started with the property of "file.encoding" and "sun.jnu.encoding" set to "ANSI_X3.4-1968", and it's easy for the user to write a invalid keyName.

This issue can be avoided when running in a good environment, but it's still not reasonable that a client create key operation causing the shutdown of OM.

This ticket is to check the keyName if it fits the current encoding of OM, if not just return the OMException of "INVALID_KEY_NAME".

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5455

How was this patch tested?

unit test

symious avatar Jul 16 '21 09:07 symious

@elek @adoroszlai Could you help to check this PR?

symious avatar Jul 16 '21 15:07 symious

@adoroszlai Thanks for the review.

Seems the config of "OZONE_OM_KEYNAME_CHARACTER_CHECK_ENABLED_KEY" will deny

   * "\", "{", "}", "^", "<", ">", "#", "|", "%", "`", "[", "]", "~", "?"
   * and Non-printable ASCII characters (128–255 decimal characters).

, but if the user does create a name contains something like "U-00000000", it won't help with the issue.

I think it is the catastrophic outcome that is not acceptable to users. In our scenario, SRE provides the basic env and we install the applications, we are not quite sure if SRE has set the correcting encoding. So we start the Ozone cluster, then it runs well for several months, suddenly the OM crashes because users are creating some invalid keys.

Besides, the keys are created by the scheduling system, the invalid key was created several hours after we told them to correct them, which means if we don't check the "Paths.get", the OM cluster isn't available for several hours, and it's not even decided by cluster maintainers, but the users.

It took some time before I found out it's related to the encoding. When the outage happens, and the error message is so obvious, I won't even check the env, so the recovery still relies on the users.

IMHO, the encoding of JVM should affect Ozone so much. If Ozone runs on "ANSI_X3.4-1968", it should provide the functions on this encoding, and if this encoding can not resolve a keyName, it should just return an INVALID_KEY_NAME and reject. If runs on "UTF-8", it should reject the user requests as well if it can not resolve.

Attached the original error log and keyName. The original error is as follows:

java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /vol_name/bucket_name/task-exec/di_scheduler_test.monhly_dudutestshell063002??prod_2021071519_HOUR_1_0_prod/main.sh
	at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
	at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
	at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
	at java.nio.file.Paths.get(Paths.java:84)
	at org.apache.hadoop.ozone.util.RadixTree.getLongestPrefix(RadixTree.java:190)
	at org.apache.hadoop.ozone.om.PrefixManagerImpl.getLongestPrefixPath(PrefixManagerImpl.java:260)
	at org.apache.hadoop.ozone.om.request.key.OMKeyRequest.getAclsForKey(OMKeyRequest.java:284)
	at org.apache.hadoop.ozone.om.request.key.OMKeyRequest.createKeyInfo(OMKeyRequest.java:267)
	at org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareKeyInfo(OMKeyRequest.java:352)
	at org.apache.hadoop.ozone.om.request.key.OMKeyCreateRequest.validateAndUpdateCache(OMKeyCreateRequest.java:278)
	at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:227)
	at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:417)
	at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$applyTransaction$1(OzoneManagerStateMachine.java:242)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

The keyName user wants to create is task-exec/di_scheduler_test.monhly_dudutestshell063002\342\200\224\342\200\224prod_2021071519_HOUR_1_0_prod/main.sh.

symious avatar Jul 19 '21 03:07 symious

I think it is the catastrophic outcome that is not acceptable to users.

I agree and would like to get this fixed. I'm just wondering about the best way. I would prefer least intrusion for users, even if it requires more work now code-wise. For example we might want to replace existing Paths.get usage with some other implementation insensitive to file.encoding. If I understand correctly it is mainly used as a convenient way to split key names at /, not for actually creating filesystem objects.

adoroszlai avatar Jul 19 '21 09:07 adoroszlai

Seems Paths.get() has been used in many places other than RadixTree.getLongestPrefix().

symious avatar Jul 19 '21 10:07 symious

Seems Paths.get() has been used in many places other than RadixTree.getLongestPrefix().

I guess we should fix those places, too.

adoroszlai avatar Dec 27 '21 11:12 adoroszlai

@adoroszlai Thanks for revisiting this PR. I was wondering that this ticket is more than file encoding. As the test case shows, it can be triggered deliberately, a malicious user could be able to crash the system with invalid keyname.

symious avatar Jul 25 '22 09:07 symious

@adoroszlai it can be triggered deliberately, a malicious user could be able to crash the system with invalid keyname.

Understood. That's why we should fix it. Just not by rejecting requests.

adoroszlai avatar Jul 25 '22 09:07 adoroszlai

If there is a invalid character, it will always throw InvalidPathException if we use methods related to "Path". So are you suggesting that we should remove all the use of "Path", and simply use String to do the work? If so this could be a big change, since many interfaces are using "Path", many compatible issues might happen during the change.

symious avatar Jul 26 '22 08:07 symious

So are you suggesting that we should remove all the use of "Path", and simply use String to do the work?

Exactly. Only for volume/bucket/key name parsing, we can keep Path for real file-related operations.

adoroszlai avatar Jul 26 '22 08:07 adoroszlai

Sorry for the late reply. I tried to replace all the "Paths.get" with "String.split" and array oprerations, but I found some special cases with "~", ".", "..", which needs the help from "Paths", I'm not sure if it's the best approach to do the replacement here.

symious avatar Aug 15 '22 06:08 symious

I found some special cases with "~", ".", "..", which needs the help from "Paths"

Why do we need any of those for volume/bucket/key names?

adoroszlai avatar Aug 15 '22 18:08 adoroszlai

I think it's normal when users are using OFS protocol.

symious avatar Aug 16 '22 00:08 symious

@symious does this PR need to be revisited?

kerneltime avatar Nov 07 '22 17:11 kerneltime

@kerneltime Sure, thank you for the review.

I updated the PR, because in our cluster, we found invalidPathException not only in CreateKeyRequest, it also emerges in other requests like CreateFileRequset, DeleteKeyRequset.

The trick thing is the path from Protobuf message seems normal, but sometimes it still throws the exception, and it can not be reproduced, it seems happen only once every two days in our UAT cluster.

symious avatar Nov 08 '22 09:11 symious

@symious Please try to avoid force-push when updating the PR. Here are some great articles that explain why:

https://developers.mattermost.com/blog/submitting-great-prs/#4-avoid-force-pushing https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review

adoroszlai avatar Nov 15 '22 06:11 adoroszlai

@adoroszlai Understand, thank you for the reminder.

symious avatar Nov 15 '22 06:11 symious