fluent icon indicating copy to clipboard operation
fluent copied to clipboard

Relax variant-name grammar

Open stasm opened this issue 7 years ago • 15 comments

variant-name is currently defined as one or more words.

/* exclude whitespace and [ \ ] { } */
word ::= (((char - line-break) - inline-space) - [#x5b#x5c#x5d#x7b#x7d])+
variant-key ::= number | word (_ word)*

We don't use words anywhere else and I find it weird that they forbid so many arbitrary characters. Should we relax it to only forbid ] and maybe [? Should we allow escaping with \]? As in: [variant name with a closing bracket \] looks like this].

Additionally, I don't see why we have to force variant-name to be non-blank. [] should parse as a variant name of "" (a zero-width string). It's nor terribly useful, but consistent.

stasm avatar Feb 09 '18 12:02 stasm

My concern here is that it opens up room for a lot of edge cases that may be confusing and potentially hard to handle. In particular:

  • Empty/accidental variant names with copy pasted invisible characters that are not matched later
  • Characters that are unlikely to be useful but since we allowed them will block us from using those characters anywhere else

I'm not completely against pursuing this direction (I know stas was in favor of it from the very beginning), but I'd like to see use cases and/or examples of users wanting to use this feature. I'm concerned about limiting our future options by relaxing the syntax with noone asking for it.

zbraniecki avatar Feb 09 '18 15:02 zbraniecki

I think there's value in variant-key being strict. This is mostly about the readability of the message. At the point where we have variants, that already goes down a fair amount. Having complexity in there would further degrade that.

Side question, is 15 ducks an OK variant-key?

Pike avatar Feb 09 '18 17:02 Pike

15 ducks is not an OK variant-key because it starts with a digit (1) so we expect it to be a number type value.

zbraniecki avatar Feb 09 '18 18:02 zbraniecki

Use-cases

There are two use-cases which would be well-served by relaxing the grammar of VariantName:

  1. Term attribute values in the native language.
  2. Free-form variable values.

Native term attribute values

Term attributes are private to the current language. They may represent grammatical concepts which are foreign to the English grammar. Consequently, they may be hard to give English names to. To fully embrace the asymmetric design of terms and their attributes, I'd like to make it possible for attribute values to be written in the target language of the localization.

-brand-name = Firefox
    .Genus = männlich 

(Ideally, the attribute's name could also contain non-ASCII characters. I'll get back to this in a second.)

Variable values

There might be cases where the localizer might want to provide a specialized translation for one of the possible variable values.

# The ʻ character is U+02BB, outside of the Basic Latin block.
welcome = { $state ->
    [Hawaiʻi] Aloha!
   *[other] Welcome!
}

In most cases, dynamic references (#80) will be the preferred way of implementing such UIs, but they do have the drawback of requiring the developer's intervention. For lightweight one-off customizations simple SelectExpressions like the one above might still be good and cheap solutions.

Proposed design

I suggest that make the [name] syntax restricted to just identifiers. This is actually more restrictive than today because today's grammar also allows whitespace.

For names containing any other characters, including whitespace, ] or any non-Latin characters, I propose a new syntax: ["name"]. The standard StringExpression grammar would apply here.

This design has the advantage of being explicit about digits: [15 ducks] (illegal) vs. ["15 ducks"], as well as about whitespace: [ trim whitespace ] vs. [" keep whitespace "].

Lastly, although this might be best discussed in a separate issue, we could consider allowing attributes names to contain arbitrary characters using a new ."attr name" = Value syntax. (Moved to #117.)

stasm avatar May 21 '18 13:05 stasm

If I'm not mistaken this is something that we did not encounter anyone requesting yet. Can it wait for post-1.0 and any user request?

zbraniecki avatar May 21 '18 15:05 zbraniecki

The sooner we relax, the less trouble we'll encounter due to cross-channel.

We haven't seen much use of custom term attributes yet, which is why I think we're not seeing these requests. I'd like to default to Unicode wherever possible. It's 2018.

stasm avatar May 21 '18 15:05 stasm

As I stated before, the risk of Unicode here is that it opens edge cases to be legitimate unless you blacklist them. Things like \n, \t, many forms of whitespaces etc. all become proper variant name characters. It would be nice to research the concept of visible/non-breaking/non-whitespace unicode characters and maybe use this notion? Maybe we can use UAX31? My worry is that I don't feel I know enough about traps and limitations coming from allowing Unicode in syntax and I've seen how much time languages that wanted to introduce them spent trying. Do you think we're somehow immune to such issues? Is there any programming language that successfully did this not via UAX31?

zbraniecki avatar May 21 '18 15:05 zbraniecki

I'm trying to make a point that this is about matching strings, not identifiers. However, in context of identifiers, UAX31 could help us, yes.

In my proposal, ["text"] would follow the same semantics and parsing logic as StringExpressions do. In fact, I'm proposing that we make the current syntax ([text]) even more restricted than it is right now, so your concern is mitigated.

stasm avatar May 21 '18 15:05 stasm

key = { key2["
"] }

Would that make such code a valid FTL?

zbraniecki avatar May 21 '18 16:05 zbraniecki

No, because StringExpressions may not contain newlines.

stasm avatar May 21 '18 16:05 stasm

Oh! Thanks for explaining it to me! No objections.

zbraniecki avatar May 21 '18 16:05 zbraniecki

Lastly, although this might be best discussed in a separate issue, we could consider allowing attributes names to contain arbitrary characters using a new ."attr name" = Value syntax.

I filed #117 to keep this out of scope of this issue.

stasm avatar May 21 '18 19:05 stasm

I'd like to nominate this for Syntax 0.6 rather than 0.7. Or, perhaps we could separate this into two changes:

  • Syntax 0.6: Restrict variant keys to numbers and identifiers only.
  • Syntax 0.7: Alllow string expressions as variant keys, too.

@Pike -- what do you think? #118 implements both in one go, but I'll be happy to split it in two.

stasm avatar May 23 '18 12:05 stasm

TBH, I don't like the proposal. I didn't get to actually read it before just now, sorry.

What I don't like in particular is introducing identifier and it's constraints in an unrelated context.

I think what we should do is more in the line of just supporting StringExpressions as VariantNames, and un-special-case the " for that context. That also implies dropping NumberExpression from the grammar.mjs, and to make the number detection in abstract.mjs, I think. Then 15 ducks and ducks 15 work both (which they currently don't).

In particular splitting this between 0.6 and 0.7 leaves us without the ability to express [männlich], and that's not what 0.6 is about.

Pike avatar May 23 '18 13:05 Pike

You're right about 0.6, let's keep this in 0.7 for now, thanks.

Do you mean introducing a new production, say, bracketed_text which is like quoted_text but delimited with [ and ]? Would you require only ] to be escaped when needed? I'm concerned about adding unnecessary complexity which extends the set of contextually-special characters.

What I think this boils down to is how we look at [ and ]. Are they supposed to delimit the variant name or to differentiate the variant's key from the variant's value? A useful test to perform is this:

padded-variant-keys =
    { $num ->
        [one  ] One
        [many ] Many
       *[other] Other
    }

In the example above, what are the variants' keys?

  • one, many and other, or
  • one__, many_, other (using _ for spaces).

If it's the former, than [ and ] are not delimiters. There's an extra trimming logic involved. Which means that variant keys are not StringExpressions but with a different delimiter.

If it's the latter, than $num == 1 will likely not match against one__ because the plural category is called one, without the trailing space. I don't think we want this behavior.

["string literal"] solves this by sticking to a single set of rules for what makes a character special. In other words, we already have a grammar production for delimited text, and it's quoted_text.

I'm also not sold on moving the parsing of numbers to abstract.mjs. I don't want to make them second-class citizens of the grammar. There might be ways around it, though. For instance, we could move the brackets inside of each of the VariantKey's alternates to make sure it parses fully. "[" NumberExpression | VariantName) "]" could become ("[" NumberExpression "]" | "[" bracketed_text"]").

stasm avatar May 23 '18 19:05 stasm