JSONFeed
JSONFeed copied to clipboard
Support unicode characters in keys
You support emojis in keys, but limit the character set to ASCII instead of supporting unicode? This seems backwards and unnecessary.
Actually, the spec doesn't say "ASCII"; it says "alphanumeric", which in a Unicode context is not well-defined (are Thai digits OK because they're numeric, but Chinese letters prohibited because they're not alphabetic?).
What is this restriction really trying to achieve?
It says ASCII explicitly, although it is further restricted to alphanumeric:
https://jsonfeed.org/version/1
Future Compatibility
A version 1 feed will be a valid version 2 feed, and so on. Future versions may add things, but won’t make older feeds invalid. Key Naming Rules
In future versions, defined keys will always adhere to these rules:
Leading _ characters are reserved for extensions. Keys will be made of alphanumeric characters from the ASCII character set. Keys will never contain a . character.
Emphasis mine.
I see – there are actually two separate, incompatible specifications for the characters allowed in keys:
– The one you reference, in the Future Compatibility section, which isn't entirely clear but seems to allow only the characters 0 (U+0030) through 9 (U+0039), A (U+0041) through Z (U+005A), a (U+0061) through z (U+007A), and _ (U+005F).
– The one I was looking at, in the Extensions section, which is even less clear, talks about alphanumeric characters without saying what they are, does not restrict to ASCII, and allows emoji.
I had the same question, but after re-reading, it appears that we (not the authors) are conflating two separate concepts:
- What rules the authors of the spec will follow.
- What rules you, as an extension writer, may follow.
The spec will always use alphanumeric ASCII keys. Extension writers may use emoji. Yes, I agree that there are some things here that should be better defined, but from what I read, there isn't an explicit incompatibility here.
For people adding support for parsing JSON Feed, I think the rules are:
- Keys may be any valid JSON encoding.
- If a key begins with a
_character, it is an extension.
If we're looking to improve the spec, I would say:
- UTF-8, only
- Keys which are added to the core specification will only be of the alphanumeric ASCII set (
\x30-\x39,\x41-\x5a,\x61-\x7a,\x5f)
Seems reasonable. :+1:
But why do you need this? The JSON document may be mapped to a class in Java so there comes limitations of Java identifiers (alphanumeric UTF16, _ $, not starting from a digit). There is two separate things: property name and map key. For the Map key it's fine to have any Unicode key. But property name can be reasonably limited. BTW see "Property Name Guidelines" https://google.github.io/styleguide/jsoncstyleguide.xml