Daniel Lemire
Daniel Lemire
It is almost certainly the case that you are mixing an old code generator with a more recent compiler. Linux distributions upgrade the two in sync, but you can design...
Note that simdutf is used by Node.js and Bun, and various other systems, in production... and it has been used in production for several years. We also have fast base64...
> It's around 7x~8x times boost compared to current implementation in SR. Interesting.
@kevincai Indeed. It is within my expectations... meaning that I expect that your results are correct. Yet these are good results.
Can you elaborate? If you mean iterating over the code point values, we do fast transcoding to UTF-32. After UTF-32 trancoding, iteration is trivial. For large inputs, it might make...
Interesting.
Please see `validate_utf8_with_errors` which implements the functionality you refer to. We will eagerly consider a pull request. cc @Nick-Nuon
The `simdutf` library has always worked in a two pass model: first compute how much memory is needed, and then we transcode. ```cpp size_t expected_utf8words = simdutf::utf8_length_from_latin1(latin1_output.get(), latin1words); std::unique_ptr utf8_output{...
> x86-64 has only 16 sse registers, how could it be 20+? These are named registers, but we have many more registers. You can examine the issue experimentally... https://lemire.me/blog/2022/06/07/memory-level-parallelism-intel-ice-lake-versus-amazon-graviton-3/
@aqrit A fun one is this PR: https://github.com/simdutf/simdutf/pull/318 The westmere kernel (which is currently just scalar code, but subject to autovectorization) is faster than a reasonable hand-coded AVX2 routine. It...