message-format-wg
message-format-wg copied to clipboard
Address #703: make syntax/data model error fallback clear
Fixes #703
This fix moves the syntax/data model error resolution text verbatim from the bottom of the pattern selection section to the formatting intro (except to replace the words 'pattern selection' with 'formatting', making it more general).
In addition, this changes uses US vs. UK spelling in the first sentence of the intro (an editorial nit) and removes the existing similar instruction about fallbacks. This is a normative change because the previous text had "MAY" for the fallback.
I also rephrased the "To start..." paragraph to be less chatty by using an imperative (this is an editorial change).
We don't have a teleconference for two weeks, so this is likely to be the first change committed after issuing the tech preview. While this marks a normative change, I believe that it reflects WG consensus.
Looking closely this morning, I see that data model errors are tricky.
- Variant Key Mismatch has specific handling in the pattern selection section. I put some error handling text there in 10481be.
- Missing Fallback Variant is something we're asking for feedback on in Tech Preview. That error would be emitted in pattern selection if we keep it an error. We should permit lazy evaluation of this error.
- Missing Selector Annotation has no home. We decided (after arguing at great length) to require that the annotation be visible in the message (and not to allow inferring the selector via reflection, for example). But we don't check for it anywhere to enforce it. Neither the message parsing nor pattern selection processes care.
- Duplicate Declaration might be ignored if the variable in question is never used in selection or inside the resulting pattern. There should be a home for it. Currently the syntax spec says that the error is produced, but not when or what the fallback behavior is:
Variables, once declared, MUST NOT be redeclared. A message that does any of the following is not valid and will produce a Duplicate Declaration error during processing:
- Duplicate Option Name is handled in the "option resolution" section of formatting. The error is optional and non-fatal.
Note that the text at the top of formatting talks about creating a message from a data model. Parsing the data model can result in any of these errors and some of them probably can't be ignored by the implementation, so perhaps some text explicitly about creating the message from a DM representation is called for.
@eemeli I think the challenge for me is:
Syntax errors relate solely to the ABNF and parsing errors generated from that.
Data model errors are aspects of MF2 that cannot be represented strictly with the syntax. If one is constructing a message from a data model representation, data model errors will be visible as immediate failures, much like syntax errors.
However, when we parse a message from a string, we allow implementations to be lazy about evaluating expressions (including declarations). Most implementations will probably not be so lazy that this comes into play, but we go out of our way to permit it. At least some of the DM errors can avoid detection when evaluated lazily (particularly duplicate option name). The tension between eager and lazy evaluation is why a lot of my comments above exist.
In that light, I would probably go back towards an earlier version of this PR and just call out the eager/lazy problem. Along the lines of:
To start, create the message by parsing a string or creating it from a data model. Any syntax or data model errors result in the fallback pattern.
[!NOTE] Some types of data model error might not be detected during the construction of the message, if the implementation chooses not to evaluate expressions eagerly.
I remember our discussing pattern selection as a step (sometimes a quite trivial one). We should definitely make that clearer.
However, when we parse a message from a string, we allow implementations to be lazy about evaluating expressions (including declarations). Most implementations will probably not be so lazy that this comes into play, but we go out of our way to permit it. At least some of the DM errors can avoid detection when evaluated lazily (particularly duplicate option name). The tension between eager and lazy evaluation is why a lot of my comments above exist.
As I also mention in https://github.com/unicode-org/message-format-wg/pull/710#discussion_r1518408912, it's important to note that we only allow laziness in the evaluation of expressions, and not in their representation. Data model errors are not included in the ones that we allow to be ignored: https://github.com/unicode-org/message-format-wg/blob/e76196481b23e6e9245923a1239282e19484efd0/spec/errors.md?plain=1#L19-L21
@catamorphism @mihnita Do you check data model errors (I do not mean "do you create a data model" but rather "do you check for the specific message errors listed under that heading in errors.md") early?
@catamorphism @mihnita Do you check data model errors (I do not mean "do you create a data model" but rather "do you check for the specific message errors listed under that heading in errors.md") early?
I have a separate checking pass that runs after parsing but before formatting that checks for all of the data model errors except "Duplicate Option Name" (which is easiest to check during parsing).
In my implementation, the presence of data model errors doesn't stop formatting from running. This was based on an earlier version of the tests that specified expected output for some messages that had data model errors. It was quite hard to implement this behavior and it would simplify the code if both syntax errors and data model errors were non-recoverable ("non-recoverable" = the whole message gets replaced with a single fallback, rather than trying to present partial input).
Sorry, didn't have time to follow the rest of the discussion since I last commented!
By the way, my implementation is lazy but it does statically check data model errors, because those errors are easiest to check in a separate pass. I don't see those two things as in conflict with each other, which I think is in line with what @eemeli said.
discard variables that were never used
Might be a good idea, or not. It depends when it is done.
For example:
.input {$firstName :string}
.input {$lastName :string}
{{Hello {$firstName}!}}
This can be interpreted as a "contract" from the developer that in the arguments passed to MF2 there will be a lastName argument, so a Japanese translator can translate as {{Hello {$lastName}!}}
They can probably be dropped at compile time, or runtime.
I think that specifying exactly WHEN (at what stage) validation happens should not be in the spec.
I currently validate data model post-parse, but before formatting. When the user gets the data model it is already validated. From the outside it looks like it happens at parse time, although it is not. I guess this is similar to what Tim does.
If we change the data model from an array of declarations to a map of declarations (as proposed by https://github.com/unicode-org/message-format-wg/issues/718) that would make (some) validation happen earlier. When we build the model, not after.
This tells me that we should try to specify what kind of errors are emitted, but not when.
Just a note that if this PR lands, then the text that was deleted in #710 should be replaced with something like "Assert that the option's identifier does not already exist in the resolved mapping of options." (That could be in this PR or a subsequent one.)
In the 2024-05-06 call we decided to close this without prejudice.