JSONFeed icon indicating copy to clipboard operation
JSONFeed copied to clipboard

Support unicode characters in keys

Open shrx opened this issue 8 years ago • 7 comments

You support emojis in keys, but limit the character set to ASCII instead of supporting unicode? This seems backwards and unnecessary.

shrx avatar May 18 '17 08:05 shrx

Actually, the spec doesn't say "ASCII"; it says "alphanumeric", which in a Unicode context is not well-defined (are Thai digits OK because they're numeric, but Chinese letters prohibited because they're not alphabetic?).

What is this restriction really trying to achieve?

NorbertLindenberg avatar May 20 '17 11:05 NorbertLindenberg

It says ASCII explicitly, although it is further restricted to alphanumeric:

https://jsonfeed.org/version/1

Future Compatibility

A version 1 feed will be a valid version 2 feed, and so on. Future versions may add things, but won’t make older feeds invalid. Key Naming Rules

In future versions, defined keys will always adhere to these rules:

Leading _ characters are reserved for extensions. Keys will be made of alphanumeric characters from the ASCII character set. Keys will never contain a . character.

Emphasis mine.

shrx avatar May 22 '17 07:05 shrx

I see – there are actually two separate, incompatible specifications for the characters allowed in keys:

– The one you reference, in the Future Compatibility section, which isn't entirely clear but seems to allow only the characters 0 (U+0030) through 9 (U+0039), A (U+0041) through Z (U+005A), a (U+0061) through z (U+007A), and _ (U+005F).

– The one I was looking at, in the Extensions section, which is even less clear, talks about alphanumeric characters without saying what they are, does not restrict to ASCII, and allows emoji.

NorbertLindenberg avatar May 22 '17 12:05 NorbertLindenberg

I had the same question, but after re-reading, it appears that we (not the authors) are conflating two separate concepts:

  1. What rules the authors of the spec will follow.
  2. What rules you, as an extension writer, may follow.

The spec will always use alphanumeric ASCII keys. Extension writers may use emoji. Yes, I agree that there are some things here that should be better defined, but from what I read, there isn't an explicit incompatibility here.

For people adding support for parsing JSON Feed, I think the rules are:

  • Keys may be any valid JSON encoding.
  • If a key begins with a _ character, it is an extension.

skyzyx avatar Jul 03 '17 19:07 skyzyx

If we're looking to improve the spec, I would say:

  • UTF-8, only
  • Keys which are added to the core specification will only be of the alphanumeric ASCII set (\x30-\x39, \x41-\x5a, \x61-\x7a, \x5f)

skyzyx avatar Jul 03 '17 19:07 skyzyx

Seems reasonable. :+1:

shrx avatar Jul 05 '17 14:07 shrx

But why do you need this? The JSON document may be mapped to a class in Java so there comes limitations of Java identifiers (alphanumeric UTF16, _ $, not starting from a digit). There is two separate things: property name and map key. For the Map key it's fine to have any Unicode key. But property name can be reasonably limited. BTW see "Property Name Guidelines" https://google.github.io/styleguide/jsoncstyleguide.xml

stokito avatar Jun 16 '20 22:06 stokito