message-format-wg icon indicating copy to clipboard operation
message-format-wg copied to clipboard

Be more specific about defining resolved values

Open eemeli opened this issue 4 months ago • 7 comments

Originally posted by @macchiati in https://github.com/unicode-org/message-format-wg/issues/1094#issuecomment-3157026080:

I agree that the terminology is very confusing. For example, a resolved value is not a "container" as the following implies (my bolding).

The resolved value of an expression with a :string function contains the string value of the operand of the annotated expression, together with its resolved locale and directionality. None of the options set on the expression are part of the resolved value.

It is instead suggested that an RV is like the class definition given in https://github.com/unicode-org/message-format-wg/blob/1f87da0574e793972d84532b25dc950115734739/spec/formatting.md#resolved-values

While this specification does not require it, a resolved value could be implemented by requiring each function handler to return a value matching the following interface:

interface MessageValue {
  formatToString(): string
  formatToX(): X // where X is an implementation-defined type
  unwrap(): unknown
  resolvedOptions(): { [key: string]: MessageValue }
  match(key: string): boolean
  betterThan(key1: string, key2: string): boolean
  directionality(): 'LTR' | 'RTL' | 'unknown'
  isolate(): boolean
  isLiteralOptionValue(): boolean
}

But that really doesn't sit very well at all with saying "contains the string value". Instead, we'd really have to say exactly what each of the methods do. And we don't do that.

If we are going down this road, we could be more specific, and replace "While this specification does not require it, a resolved value could be implemented by requiring each function handler to return a value matching the following interface:" by the following (we use the "logically equivalent" language at many places in Unicode):

A resolved value is logically equivalent to a value matching the following interface:

We can then just have a standard table for each method, with the 2nd column's cell describing what it does; that way we won't miss something important (like describing whether the unwrap() method produces a value (vs an error) and if so, what the value is defined to be.)

No implementation is required to use those names or functions internally, but that is a more concrete, uniform way to describe what a function does and does not do

BTW

  1. I think 'unknown' above would be much better as 'Y // where Y is an implementation-defined type'.
  2. The above also implies that resolvedOptions has exactly 1 key-value pair, whereas I think the intent is that it return a map with 0 or more key-value pairs.
  3. I find the unwrap() name and explanation to be particularly obscure.
  4. The grammar of the following sentence is a bit fuzzy:
  • The resolved value of an expression could be used as an operand or option value if calling the unwrap() method of its resolved value did not emit an error. (This requires an intermediate variable declaration.) In this use case, the resolvedOptions() method could also provide a set of option values that could be taken into account by the called function.
  • I think it is intended to be something like:
  • The resolved value of an expression can be used as an operand or option value, but only if calling the unwrap() method does not emit an error. In that case, then resolvedOptions() method provides a set of option values that can be taken into account by the called function. These option values may be a subset of those used to create the resolved value.

eemeli avatar Aug 06 '25 06:08 eemeli

I would be very happy for us to replace the current resolved value definition with "logically equivalent" language.

  1. I think 'unknown' above would be much better as 'Y // where Y is an implementation-defined type'.

Would that also work for Variable Resolution? Through that, MessageValue can end up wrapping any user-provided value, and that seems best represented by unknown.

  1. The above also implies that resolvedOptions has exactly 1 key-value pair, whereas I think the intent is that it return a map with 0 or more key-value pairs.

In TypeScript (which is what our code examples use), the square brackets around the key in { [key: string]: MessageValue } wrap around the type expected of the key, so the meaning here is that in this object, all keys must be strings.

  1. I find the unwrap() name and explanation to be particularly obscure.

We ended up with the current language via #1081, our latest attempt to improve this part of the spec. Further work is clearly needed.

  1. The grammar of the following sentence is a bit fuzzy:

@catamorphism, thoughts on this?

eemeli avatar Aug 06 '25 07:08 eemeli

Would that also work for Variable Resolution? Through that, MessageValue can end up wrapping any user-provided value, and that seems best represented by unknown.

I disagree. It should never be "unknown" in the definition of any particular function. It should either be

  1. a type according to the function, such as "string type" or "numeric type" OR
  2. "raises an error" if the function doesn't provide an unwrap() function.

Otherwise unwrap is useless.

Note resolvedOptions() also needs to document whether

  1. it always returns a map (with an empty map if unwrap() returns an error), OR
  2. raises an error if unwrap does.

Typescript is fine, but whatever notation is used has to be clarified to the user. So you need a little paragraph that says that the notation follows Typescript, and then also explain that { [key: string]: MessageValue } means a key-value map, from strings to message values, where each key is an option id from the expression used to create this MessageValue.

macchiati avatar Aug 06 '25 17:08 macchiati

Otherwise unwrap is useless.

It's not useless if you consider non-string formatting targets, like formatted parts. With such, variable values can quite usefully pass through MessageFormat without the formatter touching them. For example, this is possible with the JS implementation, in a browser environment:

import { MessageFormat } from 'messageformat'

const msg = 'An inline {$image} can be embedded.'

const image = document.createElement('img')
image.src = 'spherical-cow.png'

const parts = new MessageFormat('en', msg).formatToParts({ image })

const parent = document.getElementById('msg-target-id')
parent.replaceChildren(...parts.map(part => part.value))

There, the value of image is the JS representation of an HTML element, which is embedded in the message, and ultimately formatted by the DOM. But that's not really relevant from the MF2 processing PoV; it could equivalently be a React element or some other non-localizable value. The key here is that for an implementation supporting such use, unknown really is the only possible type.

Typescript is fine, but whatever notation is used has to be clarified to the user. So you need a little paragraph that says that the notation follows Typescript, and then also explain that { [key: string]: MessageValue } means a key-value map, from strings to message values, where each key is an option id from the expression used to create this MessageValue.

Yeah, probably true. I had not noticed that we only include the TypeScript mention in the Interchange Data Model introduction, and not for the whole spec.

eemeli avatar Aug 06 '25 19:08 eemeli

Some functions don't take an operand, so that could be an empty value. I don't know that unknown specifically makes sense in our context, but functions might also accept many different types. The message processor might not enforce type safety for a function, letting the function feel with it in an opaque manner (void*?)

I'm concerned that we're either too far into defining this or not far enough. At a high level, UMF gets an array of (named) values for use in formatting the message (where formatting means the whole process, including selection). We intend these values to be immutable. We intend them to be "annotatable" (with options or Metadata like direction and language). And users (message authors) can define or assign additional values in the message. The specifics don't concern us as long as the behaviors are consistent? Or maybe we do need to clearly define the resolution chain for values and options?

aphillips avatar Aug 06 '25 19:08 aphillips

  1. The grammar of the following sentence is a bit fuzzy:

@catamorphism, thoughts on this?

I like the "only if", and the edit to the last sentence is fine, but I think "(This requires an intermediate variable declaration.)" should stay in there. Perhaps that part needs to be expanded -- what do you think, @macchiati?

catamorphism avatar Aug 06 '25 20:08 catamorphism

Otherwise unwrap is useless.

It's not useless if you consider non-string formatting targets, like formatted parts. With such, variable values can quite usefully pass through MessageFormat without the formatter touching them. For example, this is possible with the JS implementation, in a browser environment:

Good point. There are times when it could be just an opaque blob whose contents is of no known type. My point is really that if is known to be a real type, then unwrap can be used in function composition. And the function should declare it if that is true. So let me revise my statement:

It should either be

  1. a type according to the function, such as "string type" or "numeric type" or unknown (if it is an opaque blog), OR
  2. "raises an error" if the function doesn't provide an unwrap() function.

macchiati avatar Aug 06 '25 22:08 macchiati

"(This requires an intermediate variable declaration.)"

Perhaps that part needs to be expanded

Yes, it is too fragmentary to work as a parenthetical. If we want to make a point about it, then that should be a separate paragraph and example, such as:

When a resolved value is used an an operand or option value, it requires an intermediate variable declaration as in the following example:

example tbd

macchiati avatar Aug 06 '25 22:08 macchiati