Daniel Lemire
Daniel Lemire
I suggest to alias UTF-16 to UTF-16LE.
There may be a validation issue... ``` ./build/tools/sutf -f UTF16LE -t UTF8 unicode_lipsum/lipsum/Russian-Lipsum.utf16.txt ``` Produces `Error with iconv.` after seemingly producing the output.
It will not help per se, but theoretically, ```C++ std::vector output_buffer(2*size); ``` can be replaced with ```C++ std::unique_ptr output_buffer(new char16_t[2*size]); ``` for higher performance... because the `unique_ptr` way does not...
So if I just have a silly file... ``` $ ls -al hello.txt -rw-rw-r-- 1 lemire lemire 6 Aug 9 01:27 hello.txt ``` Then iconv is significantly faster... ``` $...
Suppose I nuke your main function... ```diff $ git diff diff --git a/tools/sutf.cpp b/tools/sutf.cpp index ee909ef..2860fb6 100644 --- a/tools/sutf.cpp +++ b/tools/sutf.cpp @@ -245,12 +245,13 @@ void CommandLine::show_formats() { } int...
So I think that the slow starting up is due to dynamic linking... Here is a hack that might drastically speed things up... ```CMake $ git diff diff --git a/CMakeLists.txt...
(Note that my patch is not recommended, it is just a demonstration.)
This patch might be reasonable... ```diff diff --git a/tools/CMakeLists.txt b/tools/CMakeLists.txt index f51d9ef..c8a59c0 100644 --- a/tools/CMakeLists.txt +++ b/tools/CMakeLists.txt @@ -2,6 +2,8 @@ cmake_minimum_required(VERSION 3.15) add_executable(sutf sutf.cpp) target_link_libraries(sutf PUBLIC simdutf) - +if(NOT(MSVC))...
@NicolasJiaxin If you try statistically linking the C++ standard library, it should help quite a bit. Then I recommend trying to use (fixed) stack allocated buffers. There is a bunch...
@clausecker Can you propose a code sample?