stringref icon indicating copy to clipboard operation
stringref copied to clipboard

Does `string.is_usv_sequence` pull its weight?

Open wingo opened this issue 3 years ago • 3 comments

(string.is_usv_sequence str) is the same as (i32.eq -1 (string.measure_utf8 str)). Is it useful enough to keep in an MVP?

The case for keeping it: in a component-model scenario where there is an inter-component function call to an interface taking a string, and that both sides actually implement strings with stringrefs, in that case we can pass the stringref value directly without copying -- but only if the string has no isolated surrogates. We don't actually need to compute the WTF-8 length in that case and can just rely on the same internal bit that isUSVString would use.

wingo avatar Apr 27 '22 09:04 wingo

After having implemented this in V8, I think that at least there it is unlikely that we will have a isUSVString bit and will instead have to scan contents of the string, unless the string happens to have the one-byte optimization (all codepoints less than 256). I think therefore that I would propose that we remove this instruction unless there is a proven need for it; its functionality can be had via string.measure_utf8.

wingo avatar Sep 12 '22 10:09 wingo

Strings with the one-byte optimization are common enough to make the one-byte optimization itself worthwhile, even if no engine adds an isUSVString bit. Wouldn't that be sufficient to make this instruction worth considering?

sunfishcode avatar Sep 12 '22 14:09 sunfishcode

Good point @sunfishcode. An optimizer could recognize the compare-to-negative-1 pattern, of course, but best to just emit the operation we're looking for.

wingo avatar Sep 12 '22 14:09 wingo