message-format-wg
message-format-wg copied to clipboard
[FEEDBACK] Simpler formulation of Pattern Selection
I think the algorithm for variant selection is much too complicated. That is, I think we can structure it in a way that gets the same results, but is not as complicated to explain — and matches a simpler and more efficient implementation that (a) doesn’t involve sorting, (b) is single-pass, and (c) can be implemented to have a fast exit.
This was sparked by the discussion around "resolved value" being needed in pattern selection. The 'dot' notation is used for convenience here, but needn't be in the fleshed-out text.
This algorithm avoids the complications involved in sorting, and allows for a single pass to find the best pattern. As in the sorting algorithm, there may be multiple key-lists that are "as good" as one another, and in that case the first of that group is chosen.
Changes from the initial version are in italics — and the Optimizations section is all new.
Definitions
- Define a
selector-list = 1*(s selector)
, as in thematch-statement
production - Define a
key-list = key *(s key)
, as in thevariant
production
The Pattern Selection process depends on two capabilities of selectors:
- selector.match(key) returns a value in {fail, ok}
- selector.compare(key1, key2) returns a value in {worse, same, better}
The compare() is checking to see that key2 is better/same/worse than key2. It is only ever called if key1 and key2 are ok.
Using these, list versions are defined in a natural way (see below for details):
- selector-list.match(key-list)
- selector-list.compare(key-list1, key-list2)
Determining which of a message's patterns is formatted
(where there are selectors)
- Let bestVariant be undefined.
- For each variant in the list:
- Let match be selector-list.match(variant.key-list)
- If match = fail, continue the for-loop
- Else if bestVariant is undefined, let bestVariant be variant
- Else if the selector-list.compare(variant.key-list, bestVariant.key-list) = better, then let bestVariant be variant
- Continue loop
- If the loop terminated, return bestVariant
- // We are guaranteed to have one, since there is a key-list with all *’s
In other words, the result is fail if any selector.match(key) value = fail, else ok.
Determining selector-list.match(key-list)
- Let result be ok
- For each selector, key1 in the selector-list, key-list: (ie, taking the i-th element of each list)
- If key = "*", continue loop
- Let value be selector.match(toNfc(key))
- If value = fail, return fail
- Else continue loop
- If the loop terminated, return result.
Determining selector-list.compare(key-list1, key-list2)
- For each selector, key1, key2 in the selector-list, key-list1, key-list2: (ie., taking the i-th element of each list)
- Let result be selector.compare(key1, key2)
- If result ≠ same, return result
- Else continue loop
- If the loop terminated, return same.
Optimizations (Optional)
There are various optimizations that have the same results, but that can improve the performance, sometimes quite significantly.
Best Value
One of them is to modify * selector.match(key) to return an additional value, best. The use of this value allows for early termination. The way it works is that when a key-list is found where every key is best (meaning no other key is better, though other keys could be the same), then the selection process can terminate. If there is no best value (an odd but possible case), the algorithm behaves as before. That involves the following small changes (in italics):
Definitions
- selector.match(key) returns a value in {fail, ok, best}
Determining which of a message's patterns is formatted
- If match = fail, continue the for-loop
- Else if match = best, return the variant
- Else if setVariant is undefined, let bestVariant be variant
Determining selector-list.match(key-list)
- Let result be best
…- If key = "*", let result be ok, and continue loop …
- Else if value is ok, let result be ok
- Else continue loop
In other words, the result is fail if any selector.match(key) value = fail, else best if every selector.match(key) value = best, else ok.
Example
In the following, checking for the best value can eliminate (on average) half of the key-list checks in the following set of variants.
.match $v1 $v2
a a {{…}}
a b {{…}}
a c {{…}}
a * {{…}}
b a {{…}}
b b {{…}}
b c {{…}}
b * {{…}}
c a {{…}}
c b {{…}}
c c {{…}}
* * {{…}}
Reducing function calls
The selector.compare(key1, key2) is only ever called where key2 does not have any fail values. Thus an implementation need only call one function, selector.compare, if that function is enhanced to have values: {fails, worse, same, better, best}
- Let compare-value be selector-list.compare(bestVariant.key-list)
- If compare-value = fail, continue the for-loop
- Else if compare-value = best, return the variant
- Else if bestVariant is undefined, let bestVariant be variant
- Else if compare-value = better, then let bestVariant be variant
CURRENT TEXT
https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#pattern-selection ...
To determine which variant best matches a given set of inputs, each selector is used in turn to order and filter the list of variants.
Each variant with a key that does not match its corresponding selector is omitted from the list of variants. The remaining variants are sorted according to the selector's key-ordering preference. Earlier selectors in the matcher's list of selectors have a higher priority than later ones.
When all of the selectors have been processed, the earliest-sorted variant in the remaining list of variants is selected.
This selection method is defined in more detail below. An implementation MAY use any pattern selection method, as long as its observable behavior matches the results of the method defined here.
Resolve Selectors
First, resolve the values of each selector:
- Let
res
be a new empty list of resolved values that support selection. - For each selector
sel
, in source order,- Let
rv
be the resolved value ofsel
. - If selection is supported for
rv
:- Append
rv
as the last element of the listres
.
- Append
- Else:
- Let
nomatch
be a resolved value for which selection always fails. - Append
nomatch
as the last element of the listres
. - Emit a Bad Selector error.
- Let
- Let
The form of the resolved values is determined by each implementation, along with the manner of determining their support for selection.
Resolve Preferences
Next, using res
, resolve the preferential order for all message keys:
- Let
pref
be a new empty list of lists of strings. - For each index
i
inres
:- Let
keys
be a new empty list of strings. - For each variant
var
of the message:- Let
key
be thevar
key at positioni
. - If
key
is not the catch-all key'*'
:- Assert that
key
is a literal. - Let
ks
be the resolved value ofkey
in Unicode Normalization Form C. - Append
ks
as the last element of the listkeys
.
- Assert that
- Let
- Let
rv
be the resolved value at indexi
ofres
. - Let
matches
be the result of calling the method MatchSelectorKeys(rv
,keys
) - Append
matches
as the last element of the listpref
.
- Let
The method MatchSelectorKeys is determined by the implementation.
It takes as arguments a resolved selector value rv
and a list of string keys keys
,
and returns a list of string keys in preferential order.
The returned list MUST contain only unique elements of the input list keys
.
The returned list MAY be empty.
The most-preferred key is first,
with each successive key appearing in order by decreasing preference.
The resolved value of each key MUST be in Unicode Normalization Form C ("NFC"), even if the literal for the key is not.
If calling MatchSelectorKeys encounters any error, a Bad Selector error is emitted and an empty list is returned.
Filter Variants
Then, using the preferential key orders pref
,
filter the list of variants to the ones that match with some preference:
- Let
vars
be a new empty list of variants. - For each variant
var
of the message:- For each index
i
inpref
:- Let
key
be thevar
key at positioni
. - If
key
is the catch-all key'*'
:- Continue the inner loop on
pref
.
- Continue the inner loop on
- Assert that
key
is a literal. - Let
ks
be the resolved value ofkey
. - Let
matches
be the list of strings at indexi
ofpref
. - If
matches
includesks
:- Continue the inner loop on
pref
.
- Continue the inner loop on
- Else:
- Continue the outer loop on message variants.
- Let
- Append
var
as the last element of the listvars
.
- For each index
Sort Variants
Finally, sort the list of variants vars
and select the pattern:
- Let
sortable
be a new empty list of (integer, variant) tuples. - For each variant
var
ofvars
:- Let
tuple
be a new tuple (-1,var
). - Append
tuple
as the last element of the listsortable
.
- Let
- Let
len
be the integer count of items inpref
. - Let
i
belen
- 1. - While
i
>= 0:- Let
matches
be the list of strings at indexi
ofpref
. - Let
minpref
be the integer count of items inmatches
. - For each tuple
tuple
ofsortable
:- Let
matchpref
be an integer with the valueminpref
. - Let
key
be thetuple
variant key at positioni
. - If
key
is not the catch-all key'*'
:- Assert that
key
is a literal. - Let
ks
be the resolved value ofkey
. - Let
matchpref
be the integer position ofks
inmatches
.
- Assert that
- Set the
tuple
integer value asmatchpref
.
- Let
- Set
sortable
to be the result of calling the methodSortVariants(sortable)
. - Set
i
to bei
- 1.
- Let
- Let
var
be the variant element of the first element ofsortable
. - Select the pattern of
var
.
SortVariants
is a method whose single argument is
a list of (integer, variant) tuples.
It returns a list of (integer, variant) tuples.
Any implementation of SortVariants
is acceptable
as long as it satisfies the following requirements:
- Let
sortable
be an arbitrary list of (integer, variant) tuples. - Let
sorted
beSortVariants(sortable)
. -
sorted
is the result of sortingsortable
using the following comparator:-
(i1, v1)
<=(i2, v2)
if and only ifi1 <= i2
.
-
- The sort is stable (pairs of tuples from
sortable
that are equal in their first element have the same relative order insorted
).