Consider adopting `simdutf` as a possible transcoding backend

Open DJm00n opened this issue 2 years ago • 1 comments

This library provide fast Unicode functions such as:

ASCII, UTF-8, UTF-16LE/BE and UTF-32 validation, with and without error identification,
Latin1 to UTF-8 transcoding,
Latin1 to UTF-16LE/BE transcoding
Latin1 to UTF-32 transcoding
UTF-8 to Latin1 transcoding, with or without validation, with and without error identification,
UTF-8 to UTF-16LE/BE transcoding, with or without validation, with and without error identification,
UTF-8 to UTF-32 transcoding, with or without validation, with and without error identification,
UTF-16LE/BE to Latin1 transcoding, with or without validation, with and without error identification,
UTF-16LE/BE to UTF-8 transcoding, with or without validation, with and without error identification,
UTF-32 to Latin1 transcoding, with or without validation, with and without error identification,
UTF-32 to UTF-8 transcoding, with or without validation, with and without error identification,
UTF-32 to UTF-16LE/BE transcoding, with or without validation, with and without error identification,
UTF-16LE/BE to UTF-32 transcoding, with or without validation, with and without error identification,
From an UTF-8 string, compute the size of the Latin1 equivalent string,
From an UTF-8 string, compute the size of the UTF-16 equivalent string,
From an UTF-8 string, compute the size of the UTF-32 equivalent string (equivalent to UTF-8 character counting),
From an UTF-16LE/BE string, compute the size of the Latin1 equivalent string,
From an UTF-16LE/BE string, compute the size of the UTF-8 equivalent string,
From an UTF-32 string, compute the size of the UTF-8 or UTF-16LE equivalent string,
From an UTF-16LE/BE string, compute the size of the UTF-32 equivalent string (equivalent to UTF-16 character counting),
UTF-8 and UTF-16LE/BE character counting.
UTF-16 endianness change (UTF16-LE/BE to UTF-16-BE/LE)

The functions are accelerated using SIMD instructions (e.g., ARM NEON, SSE, AVX, AVX-512, etc.). When your strings contain hundreds of characters, we can often transcode them at speeds exceeding a billion characters per second.

See https://github.com/simdutf/simdutf

Sep 20 '23 17:09 DJm00n

It can be used separately, why?

Dec 23 '23 14:12 MBkkt