stringref icon indicating copy to clipboard operation
stringref copied to clipboard

Can encoding instructions use memory as a scratch space?

Open wingo opened this issue 3 years ago • 2 comments

If, in the implementation of stringrefs in your wasm VM, you have a managed buffer of WTF-8, and the user requests that you write UTF-8 to memory via string.encode_lossy_utf8, one tactic would be to just memcpy the whole thing, and then go back and change any surrogate to be U+FFFD. (Not saying it's a good strategy, just a possible strategy.) In a single-threaded world, this is fine. Would it be fine with threads? See https://github.com/WebAssembly/threads/issues/189.

wingo avatar Jul 13 '22 13:07 wingo

IIUC, the complication here in comparison to memory.fill would be the potential for a racing thread to observe the unsanitised surrogate before it's overwritten. We could write the specification for string.encode_lossy_utf8 so that this additional behaviour is permitted without too much trouble if this is a desirable implementation to support (i.e. a racing thread could see arbitrary interleavings of the old data, the unsanitised new data, and the sanitised new data).

conrad-watt avatar Jul 13 '22 13:07 conrad-watt

if this is a desirable implementation to support

It's not a big issue either way, but the single-memcpy-plus-fixups implementation is a nice simplification compared to the alternative, so yeah, it would be nice (but not crucial) to support it. You can see the difference here (lines 1274 and following).

jakobkummerow avatar Jul 13 '22 17:07 jakobkummerow