UTF-8 validation
UTF-8 is a very popular string encoding, for example, it's the encoding used by over 95% of all Web content. It's not uncommon for applications to need to do UTF-8 validation on their own strings, and since all WebAssembly VMs have UTF-8 validation logic built in as required by the spec, we should define a WASI API to let applications call into the VM's UTF-8 validation logic rather than having to bundle their own.
I'm picturing an API which takes a byte slice as input and returns a boolean value indicating whether it's valid or not. This is the minimum that WebAssembly engines themselves are required to have, and would be enough for eg. the use case of implementing a UTF-8 validity check for a a WASI API implemented in wasm.
More elaborate APIs are possible, such as validation which returns the position where an error occurred, and possibly information about the error, but I think it makes sense to start with something simple. I won't have time to make an official proposal myself for a while, but I wanted to file this issue to see what others think!
Thinking about this more, I think it may make sense to have a return value that carries more information.
Taking inspriation from Rust's str::from_utf8 and Utf8Error, I'm picturing an API like this:
(typename $utf8_error
(record
(field $valid_up_to $size)
(field $incomplete bool)
(field $error_len $size)
)
)
...
(@interface func (export "validate")
(param $bytes (list u8))
(result $error (expected (error $utf8_error)))
)
Hi,
I'm a little bit confused by this.
Why would a language's standard library call this instead of using the UTF-8 validation function they already have, and use for all other targets?
@jedisct1 Because of the way that the web works, file size is very important, and so relying on the browser's functionality means smaller file sizes.