JSONKit icon indicating copy to clipboard operation
JSONKit copied to clipboard

Why is Unicode NULL codepoint invalid?

Open adamjernst opened this issue 14 years ago • 5 comments

Why is the Unicode NULL codepoint, when properly escaped, invalid? That is, a string like "Testing\u0000Hello world"

isValidCodePoint returns sourceIllegal for the NULL character (ch == 0U). The headers say:

The code in isValidCodePoint() is derived from the ICU code in
utf.h for the macros U_IS_UNICODE_NONCHAR and U_IS_UNICODE_CHAR.

However, U_IS_UNICODE_CHAR(0) returns true.

In addition, RFC 4627 makes a passing reference to U+0000 as being allowed:

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence

I can of course use the LooseUnicode option to replace it, but that's not ideal since it is valid JSON as far as I can tell.

adamjernst avatar Oct 12 '11 16:10 adamjernst

Well, I have to admit, you certainly did your homework, more so than most people. :)

Just one question... did you read the README.md? :)

johnezang avatar Oct 21 '11 18:10 johnezang

Well, look at that, a whole paragraph for this exact issue. :-)

I do disagree, since I think JSONKit should just do the right thing and let the user deal with the security implications. Most users will be using NSString anyway, which does handle null characters correctly. However, yours is a perfectly valid decision.

I would request a more specific error message for this particular case, at least; e.g. "\u0000 is not allowed for security reasons, use JKParseOptionLooseUnicode"). Whether this is practical is up to you. It would have saved me an hour of research, but this is an obscure case.

BTW, the trigger was that I'm dealing with JSON from ID3 (MP3) tags from a large database of media files. Lots of null characters in there for inexplicable reasons.

Thanks for the library!

adamjernst avatar Oct 21 '11 19:10 adamjernst

Bump. I would like to see this fixed too. I agree that the security issue is mitigated by using the NSString class, and valid Unicode and JSON should be respected.

derekjensen avatar Jan 24 '12 02:01 derekjensen

+1

filmaj avatar Jan 24 '12 17:01 filmaj

I'm busy writing my own UTF-8 library, and stumbled into the same issue. Right now I'm leaning towards not supporting U+0000 at all, for the same reasons as JSONKit. I'm curious to know if anyone has any real-world stories of a case where it was essential to support decoding U+0000? Is it possible that the ID3 tags mentioned above by @adamjernst were crafted with malicious intent, or that they were simply the result of buggy software that produced them?

bmharper avatar Aug 23 '16 19:08 bmharper