Daniel Lemire comments

Results 1864 comments of


                                            Daniel Lemire

Consider replacing src/util/simdutf8check.h

It is almost certainly the case that you are mixing an old code generator with a more recent compiler. Linux distributions upgrade the two in sync, but you can design...

Consider replacing src/util/simdutf8check.h

Note that simdutf is used by Node.js and Bun, and various other systems, in production... and it has been used in production for several years. We also have fast base64...

Consider replacing src/util/simdutf8check.h

> It's around 7x~8x times boost compared to current implementation in SR. Interesting.

Consider replacing src/util/simdutf8check.h

@kevincai Indeed. It is within my expectations... meaning that I expect that your results are correct. Yet these are good results.

Any plan to add string iterator?

Can you elaborate? If you mean iterating over the code point values, we do fast transcoding to UTF-32. After UTF-32 trancoding, iteration is trivial. For large inputs, it might make...

Any plan to add string iterator?

Interesting.

utf8 validator improvements

Please see `validate_utf8_with_errors` which implements the functionality you refer to. We will eagerly consider a pull request. cc @Nick-Nuon

`convert_latin1_to_utf8` doesn't accept length field for `utf8_output` pointer

The `simdutf` library has always worked in a two pass model: first compute how much memory is needed, and then we transcode. ```cpp size_t expected_utf8words = simdutf::utf8_length_from_latin1(latin1_output.get(), latin1words); std::unique_ptr utf8_output{...

investigate loop unrolling

> x86-64 has only 16 sse registers, how could it be 20+? These are named registers, but we have many more registers. You can examine the issue experimentally... https://lemire.me/blog/2022/06/07/memory-level-parallelism-intel-ice-lake-versus-amazon-graviton-3/

investigate loop unrolling

@aqrit A fun one is this PR: https://github.com/simdutf/simdutf/pull/318 The westmere kernel (which is currently just scalar code, but subject to autovectorization) is faster than a reasonable hand-coded AVX2 routine. It...