Daniel Lemire
Daniel Lemire
@easyaspi314 Sure!!! Saving 2 kB if it is performance neutral would be huge. Can you try it out ? I recommend you run benchmarks too.
We care a lot about ARM. > Well an initial test on an ARM Cortex-X1 (yes it is my phone) + Clang 16 shows about 5% overhead and probably can...
Running tests.
> is definitely worth a 5% perf loss Glancing at the code, I am not sure that this should cause a 5% perf loss. It is fairly difficult to measure...
Here are my results (compare with above) on this PR.... Before... ``` ./build/benchmarks/benchmark -P convert_utf8_to_utf16+westmere -F unicode_lipsum/lipsum/*.utf8.txt | grep GB 2.252 GB/s (3.6 %) 1.262 Gc/s 1.78 byte/char 3.196 GB/s...
My concern is that according to my naive view, your PR should be performance neutral... but it seems that it is not. I see a measurable impact (up to 10%)....
Intriguing. Is there some kind of specification for such outputs?
@victor1234 Do you know why it fails under Visual Studio?
Yes, it looks good: https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.vector/jdk/incubator/vector/ByteVector.html It should be just one short function.
Yes, a pull request to provide this functionality in C is invited.