jumanpp
jumanpp copied to clipboard
MSVC 2017 build fails with x86 architecture (v2.0.0-rc2)
Using MSVC 2017 Community on Windows 10 and CMake 3.11.3, I am trying to build Windows binary from jumanpp-2.0.0-rc2.tar.xz. However, I have some trouble to ask the developers for help. (On Linux, I got working binary from the same source. Many thanks!)
Procedure
- Install MSVC 2017 Comunnity version (chose default set for Desktop development with C++) and CMake 3.11.3 (chose Windows win64-x64 ZIP as the installer gave an error).

- Download jumanpp-2.0.0-rc2.tar.xz and extract into C:\data\jumanpp-2.0.0-rc2 (I used 7-Zip 18.05).
- As instructed https://github.com/ku-nlp/jumanpp#building-from-a-package , open a terminal and do the following with the result.

C:\data\jumanpp-2.0.0-rc2>mkdir bld
C:\data\jumanpp-2.0.0-rc2>cd bld
C:\data\jumanpp-2.0.0-rc2\bld>cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX="C:\Program Files\jumanpp"
-- Building for: Visual Studio 15 2017
-- Selecting Windows SDK version 10.0.17134.0 to target Windows 10.0.16299.
-- The C compiler identification is MSVC 19.14.26430.0
-- The CXX compiler identification is MSVC 19.14.26430.0
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.14.26428/bin/Hostx86/x86/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.14.26428/bin/Hostx86/x86/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.14.26428/bin/Hostx86/x86/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.14.26428/bin/Hostx86/x86/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- Could NOT find Protobuf (missing: Protobuf_LIBRARIES Protobuf_INCLUDE_DIR)
Not using Protobuf-based components
-- Eigen3... OK
Static feature generation:
COMMAND cgtest02_codegen_binary cgtest02 Test02 C:/data/jumanpp-2.0.0-rc2/bld/src/core/codegen/gen
DEPENDS cgtest02_codegen_binary cg_2_spec.h
OUTPUT C:/data/jumanpp-2.0.0-rc2/bld/src/core/codegen/gen/cgtest02.cc C:/data/jumanpp-2.0.0-rc2/bld/src/core/codegen/gen/cgtest02.h
LIBS jpp_core
Static feature generation:
COMMAND jpp_jumandic_cg_codegen_binary jpp_jumandic_cg JumandicStatic C:/data/jumanpp-2.0.0-rc2/bld/src/jumandic/gen
DEPENDS jpp_jumandic_cg_codegen_binary shared/jumandic_spec.cc
OUTPUT C:/data/jumanpp-2.0.0-rc2/bld/src/jumandic/gen/jpp_jumandic_cg.cc C:/data/jumanpp-2.0.0-rc2/bld/src/jumandic/gen/jpp_jumandic_cg.h
LIBS jpp_jumandic_spec
C:/data/jumanpp-2.0.0-rc2/bld/src/jumandic/gen
-- Configuring done
-- Generating done
-- Build files have been written to: C:/data/jumanpp-2.0.0-rc2/bld
- Doubleclick "C:\data\jumanpp-2.0.0-rc2\bld\jumanpp.sln" and launch MSVC 2017.
- After some messages and project loading is done, choose ALL_BUILD from Solution Explorer and build with solution configuration of Release (changed from Debug) and Win32 (as default).

- After a while, I get 19 errors and many warnings and the build fails. Attached gz file is the exact output of the build (1210 lines long).
jumanpp-build-log.txt.gz
Since Linux build is fine and it actually works pretty well, something on my Windows is not right. Some advice would be really appreciated.
I am using the same procedure (the only difference is that I was using mingw-msys64 cmake and not the Windows installer) and it works (with AppVeyor CI as well). I'll try to dig into it.
A question: does this script work for you? (A build out of MSVS GUI, but still using MSVC toolkit).
mkdir build-dir
cd build-dir
cmake -G "Visual Studio 15 2017 Win64" ..
cmake --build . --config Release
ctest -C Release --output-on-failure
Wow! I did exactly you suggested me to do and voila! Only ctest output is enough, I think, but in case you need I attach the whole log file (6272 line long).
C:\data\jumanpp-2.0.0-rc2\build-dir>ctest -C Release --output-on-failure
Test project C:/data/jumanpp-2.0.0-rc2/build-dir
Start 1: jpp_util_test
1/10 Test #1: jpp_util_test .................... Passed 0.07 sec
Start 2: testing_infra_test
2/10 Test #2: testing_infra_test ............... Passed 0.05 sec
Start 3: jpp_core_tests
3/10 Test #3: jpp_core_tests ................... Passed 0.27 sec
Start 4: jpp_core_analysis_tests
4/10 Test #4: jpp_core_analysis_tests .......... Passed 0.72 sec
Start 5: jpp_codegen_tests
5/10 Test #5: jpp_codegen_tests ................ Passed 0.09 sec
Start 6: jpp_core_spec_tests
6/10 Test #6: jpp_core_spec_tests .............. Passed 0.06 sec
Start 7: jpp_core_train_tests
7/10 Test #7: jpp_core_train_tests ............. Passed 0.20 sec
Start 8: jpp_jumandic_tests
8/10 Test #8: jpp_jumandic_tests ............... Passed 0.33 sec
Start 9: jpp_bug_tests
9/10 Test #9: jpp_bug_tests .................... Passed 0.14 sec
Start 10: jpp_rnn_tests
10/10 Test #10: jpp_rnn_tests .................... Passed 0.22 sec
100% tests passed, 0 tests failed out of 10
Total Test time (real) = 2.24 sec
Some further questions:
-
After the build is done, I manually grab "{where source is located}\jumanpp-2.0.0-rc2\build-dir\src\jumandic\Release\jumanpp_v2.exe" and "{where source is located}\jumanpp-2.0.0-rc2\model\jumandic.jppmdl" and put them into "C:\Program Files\jumanpp" and "C:\Program Files\jumanpp\libexec\jumanpp" respectively (as config file "{where source is located}\jumanpp-2.0.0-rc2\model\jumandic.conf.in" suggested to put the dic file like that). Is there a command to install after build?
-
I also copy the config file and put it in the binary directory, rename it to jumandic.conf and try to run, but fails (tried with original file name as well). I set a model option in command line right now. How can I direct the binary to use the config file automatically without -c option? Using -c option to specify the config file fails because I put it under C:\Program Files\jumanpp which include a space. Escaping a space by \ does not seem to work for me. Strange escape is not even needed in command line when double quoted.
C:\Program Files\jumanpp>jumanpp_v2.exe -c jumandic.conf sample.txt
failed to load model from disk: InvalidState: could not access file: "C:/Program errcode=123
backtrace:
jumanpp::util::MappedFile::open at c:\data\jumanpp-2.0.0-rc2\src\util\mmap_impl_win32.h:66
jumanpp::core::model::FilesystemModel::open at c:\data\jumanpp-2.0.0-rc2\src\core\impl\model_io.cc:111
jumanpp::core::JumanppEnv::loadModel at c:\data\jumanpp-2.0.0-rc2\src\core\env.cc:11
jumanpp::jumandic::JumanppExec::init at c:\data\jumanpp-2.0.0-rc2\src\jumandic\shared\jumandic_env.cc:26
Here is my content of jumandoc.conf
--model="C:/Program Files/jumanpp/libexec/jumanpp/jumandic.jppmdl"
--rnn-nce-bias=5.62844432562
--rnn-unk-constant=-3.4748115191
--rnn-unk-length=-2.92994951022
--feature-weight-perceptron=1
--feature-weight-rnn=0.0176
- Is there any way to control output character code? Above terminal is mintty. Windows default terminal won't display the output correctly.
Thank you!
Nice to know that the build have passed. That means that at least MSBuild files generated by CMake are sane. I wonder that is the problem with the VS Solution.
- We want to build an installer for Windows. #81. Installing from the CMake source build... is a bit advanced thing on Windows.
- Thats a known bug :P. Please don't use spaces in the path names for the time being. I will fix it together with #83.
- I do not plan to add any character code convertion functionality in the binary itself. You can use (or write) a wrapper script if you want.
Thank you for clearing those things out. Totally understandable especially about output code 3. This issue is now solved, I think.
I have looked over your build log once more. It seems that by default CMake tries to generate build files for 32-bit. I haven't tested that configuration.
It probably should work, but a lot of Juman++ internal optimizations assume that we are using a 64-bit platform and have a sane number of 64-bit registers, which is obviously false on x86.
I once tried
cmake -G "Visual Studio 15 2017" ..
but not
cmake -G "Visual Studio 15 2017 Win64" ..
So... Win64 makes the difference, I see.
Visual Studio 15 2017 is always 32 bit project and you need append Win64 for 64-bit builds
It should be possible to add warning to CMake to generate at least warning in such cases
Win32 build should not fail. It is a bug. It should be significantly slower (my guess is ~10x) than Win64 version though.
Oh ok, I can check it out on my windows laptop since I have toolchain
Tests still fail, so I reopen this issue.
Still, I don't know if this is a good idea to support 32 bit architecture at all. The problem is: Juman++ memory maps analysis model and it can be large: 300m. When embedding this thing into other applications you can run out of virtual memory very fast on 32 bits.
Also, hashing algorithm I use assumes that we have a lot of 64-bit integer registers. That will cause slowdown on x32 (I don't know how much though). Hashing is the cornerstone of the analysis speed by the way.
@eiennohito Yeah, sorry, I suspect yesterday I tested it incorrectly.
Looking at tests:
TEST_CASE("murmur hash works with strings") {
auto string = "this is a test string";
auto ptr = reinterpret_cast<const jumanpp::u8*>(string);
auto hv = murmurhash3_memory(ptr, ptr + sizeof(string), 10);
CHECK(hv == 0x2e62c68031e4659dULL);
}
Even tests themself assume 64 bit integers, but internally murmur_hash uses size_t so you can easily lose accuracy
That's a bug, it should use uint64_t. Otherwise models will not be portable.
There are actually lots of warnings about implicit casts between size_t and uint64_t so to be sure I'd suggest to go through all of them... but it is lots to take on, so might take a while
At least MurmurHash internal hashing representation should use fixed 64 bit integers.
But sized integer hygiene is... not good, yep.
@eiennohito Just letting you know I for now give up with murmurhash3_memory tests, I just do not see where narrowing could happen that starts failing test.
I doubt it is critical issue since x64 is recommended to be used, but well maybe someone else will try to tackle it. Or me at later time.
Thanks for digging. I'm busy with a paper and upcoming PhD thesis right now, so I can't guarantee when I look into it, but I will try.
If your project is sorta largish, I strongly recommend you to use Juman++ via gRPC. You will be protected from crashes and other weird things that way.
https://github.com/eiennohito/jumanpp-grpc
