jumanpp icon indicating copy to clipboard operation
jumanpp copied to clipboard

MSVC 2017 build fails with x86 architecture (v2.0.0-rc2)

Open taxen opened this issue 7 years ago • 17 comments

Using MSVC 2017 Community on Windows 10 and CMake 3.11.3, I am trying to build Windows binary from jumanpp-2.0.0-rc2.tar.xz. However, I have some trouble to ask the developers for help. (On Linux, I got working binary from the same source. Many thanks!)

Procedure

  1. Install MSVC 2017 Comunnity version (chose default set for Desktop development with C++) and CMake 3.11.3 (chose Windows win64-x64 ZIP as the installer gave an error). jumanpp01 jumanpp02 jumanpp05 jumanpp03
  2. Download jumanpp-2.0.0-rc2.tar.xz and extract into C:\data\jumanpp-2.0.0-rc2 (I used 7-Zip 18.05).
  3. As instructed https://github.com/ku-nlp/jumanpp#building-from-a-package , open a terminal and do the following with the result. jumanpp04
C:\data\jumanpp-2.0.0-rc2>mkdir bld

C:\data\jumanpp-2.0.0-rc2>cd bld

C:\data\jumanpp-2.0.0-rc2\bld>cmake ..  -DCMAKE_BUILD_TYPE=Release    -DCMAKE_INSTALL_PREFIX="C:\Program Files\jumanpp"
-- Building for: Visual Studio 15 2017
-- Selecting Windows SDK version 10.0.17134.0 to target Windows 10.0.16299.
-- The C compiler identification is MSVC 19.14.26430.0
-- The CXX compiler identification is MSVC 19.14.26430.0
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.14.26428/bin/Hostx86/x86/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.14.26428/bin/Hostx86/x86/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.14.26428/bin/Hostx86/x86/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.14.26428/bin/Hostx86/x86/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- Could NOT find Protobuf (missing: Protobuf_LIBRARIES Protobuf_INCLUDE_DIR)
Not using Protobuf-based components
-- Eigen3... OK

    Static feature generation:
        COMMAND cgtest02_codegen_binary cgtest02 Test02 C:/data/jumanpp-2.0.0-rc2/bld/src/core/codegen/gen
        DEPENDS cgtest02_codegen_binary cg_2_spec.h
        OUTPUT C:/data/jumanpp-2.0.0-rc2/bld/src/core/codegen/gen/cgtest02.cc C:/data/jumanpp-2.0.0-rc2/bld/src/core/codegen/gen/cgtest02.h
        LIBS jpp_core


    Static feature generation:
        COMMAND jpp_jumandic_cg_codegen_binary jpp_jumandic_cg JumandicStatic C:/data/jumanpp-2.0.0-rc2/bld/src/jumandic/gen
        DEPENDS jpp_jumandic_cg_codegen_binary shared/jumandic_spec.cc
        OUTPUT C:/data/jumanpp-2.0.0-rc2/bld/src/jumandic/gen/jpp_jumandic_cg.cc C:/data/jumanpp-2.0.0-rc2/bld/src/jumandic/gen/jpp_jumandic_cg.h
        LIBS jpp_jumandic_spec

C:/data/jumanpp-2.0.0-rc2/bld/src/jumandic/gen
-- Configuring done
-- Generating done
-- Build files have been written to: C:/data/jumanpp-2.0.0-rc2/bld
  1. Doubleclick "C:\data\jumanpp-2.0.0-rc2\bld\jumanpp.sln" and launch MSVC 2017.
  2. After some messages and project loading is done, choose ALL_BUILD from Solution Explorer and build with solution configuration of Release (changed from Debug) and Win32 (as default). jumanpp06
  3. After a while, I get 19 errors and many warnings and the build fails. Attached gz file is the exact output of the build (1210 lines long). jumanpp07 jumanpp-build-log.txt.gz

Since Linux build is fine and it actually works pretty well, something on my Windows is not right. Some advice would be really appreciated.

taxen avatar Jun 06 '18 13:06 taxen

I am using the same procedure (the only difference is that I was using mingw-msys64 cmake and not the Windows installer) and it works (with AppVeyor CI as well). I'll try to dig into it.

A question: does this script work for you? (A build out of MSVS GUI, but still using MSVC toolkit).

mkdir build-dir
cd build-dir
cmake -G "Visual Studio 15 2017 Win64" ..
cmake --build . --config Release
ctest -C Release --output-on-failure

eiennohito avatar Jun 06 '18 23:06 eiennohito

Wow! I did exactly you suggested me to do and voila! Only ctest output is enough, I think, but in case you need I attach the whole log file (6272 line long).

C:\data\jumanpp-2.0.0-rc2\build-dir>ctest -C Release --output-on-failure
Test project C:/data/jumanpp-2.0.0-rc2/build-dir
      Start  1: jpp_util_test
 1/10 Test  #1: jpp_util_test ....................   Passed    0.07 sec
      Start  2: testing_infra_test
 2/10 Test  #2: testing_infra_test ...............   Passed    0.05 sec
      Start  3: jpp_core_tests
 3/10 Test  #3: jpp_core_tests ...................   Passed    0.27 sec
      Start  4: jpp_core_analysis_tests
 4/10 Test  #4: jpp_core_analysis_tests ..........   Passed    0.72 sec
      Start  5: jpp_codegen_tests
 5/10 Test  #5: jpp_codegen_tests ................   Passed    0.09 sec
      Start  6: jpp_core_spec_tests
 6/10 Test  #6: jpp_core_spec_tests ..............   Passed    0.06 sec
      Start  7: jpp_core_train_tests
 7/10 Test  #7: jpp_core_train_tests .............   Passed    0.20 sec
      Start  8: jpp_jumandic_tests
 8/10 Test  #8: jpp_jumandic_tests ...............   Passed    0.33 sec
      Start  9: jpp_bug_tests
 9/10 Test  #9: jpp_bug_tests ....................   Passed    0.14 sec
      Start 10: jpp_rnn_tests
10/10 Test #10: jpp_rnn_tests ....................   Passed    0.22 sec

100% tests passed, 0 tests failed out of 10

Total Test time (real) =   2.24 sec

jumanpp-build-log2.txt.gz jumanpp08

Some further questions:

  1. After the build is done, I manually grab "{where source is located}\jumanpp-2.0.0-rc2\build-dir\src\jumandic\Release\jumanpp_v2.exe" and "{where source is located}\jumanpp-2.0.0-rc2\model\jumandic.jppmdl" and put them into "C:\Program Files\jumanpp" and "C:\Program Files\jumanpp\libexec\jumanpp" respectively (as config file "{where source is located}\jumanpp-2.0.0-rc2\model\jumandic.conf.in" suggested to put the dic file like that). Is there a command to install after build?

  2. I also copy the config file and put it in the binary directory, rename it to jumandic.conf and try to run, but fails (tried with original file name as well). I set a model option in command line right now. How can I direct the binary to use the config file automatically without -c option? Using -c option to specify the config file fails because I put it under C:\Program Files\jumanpp which include a space. Escaping a space by \ does not seem to work for me. Strange escape is not even needed in command line when double quoted.

C:\Program Files\jumanpp>jumanpp_v2.exe -c jumandic.conf sample.txt
failed to load model from disk: InvalidState: could not access file: "C:/Program errcode=123
backtrace:
    jumanpp::util::MappedFile::open at c:\data\jumanpp-2.0.0-rc2\src\util\mmap_impl_win32.h:66
    jumanpp::core::model::FilesystemModel::open at c:\data\jumanpp-2.0.0-rc2\src\core\impl\model_io.cc:111
    jumanpp::core::JumanppEnv::loadModel at c:\data\jumanpp-2.0.0-rc2\src\core\env.cc:11
    jumanpp::jumandic::JumanppExec::init at c:\data\jumanpp-2.0.0-rc2\src\jumandic\shared\jumandic_env.cc:26

Here is my content of jumandoc.conf

--model="C:/Program Files/jumanpp/libexec/jumanpp/jumandic.jppmdl"
--rnn-nce-bias=5.62844432562
--rnn-unk-constant=-3.4748115191
--rnn-unk-length=-2.92994951022
--feature-weight-perceptron=1
--feature-weight-rnn=0.0176

  1. Is there any way to control output character code? Above terminal is mintty. Windows default terminal won't display the output correctly.

Thank you!

taxen avatar Jun 07 '18 01:06 taxen

Nice to know that the build have passed. That means that at least MSBuild files generated by CMake are sane. I wonder that is the problem with the VS Solution.

  1. We want to build an installer for Windows. #81. Installing from the CMake source build... is a bit advanced thing on Windows.
  2. Thats a known bug :P. Please don't use spaces in the path names for the time being. I will fix it together with #83.
  3. I do not plan to add any character code convertion functionality in the binary itself. You can use (or write) a wrapper script if you want.

eiennohito avatar Jun 07 '18 03:06 eiennohito

Thank you for clearing those things out. Totally understandable especially about output code 3. This issue is now solved, I think.

taxen avatar Jun 07 '18 03:06 taxen

I have looked over your build log once more. It seems that by default CMake tries to generate build files for 32-bit. I haven't tested that configuration.

It probably should work, but a lot of Juman++ internal optimizations assume that we are using a 64-bit platform and have a sane number of 64-bit registers, which is obviously false on x86.

eiennohito avatar Jun 07 '18 06:06 eiennohito

I once tried cmake -G "Visual Studio 15 2017" .. but not cmake -G "Visual Studio 15 2017 Win64" .. So... Win64 makes the difference, I see.

taxen avatar Jun 07 '18 07:06 taxen

Visual Studio 15 2017 is always 32 bit project and you need append Win64 for 64-bit builds

It should be possible to add warning to CMake to generate at least warning in such cases

DoumanAsh avatar Nov 30 '18 05:11 DoumanAsh

Win32 build should not fail. It is a bug. It should be significantly slower (my guess is ~10x) than Win64 version though.

eiennohito avatar Nov 30 '18 05:11 eiennohito

Oh ok, I can check it out on my windows laptop since I have toolchain

DoumanAsh avatar Nov 30 '18 05:11 DoumanAsh

Tests still fail, so I reopen this issue.

Still, I don't know if this is a good idea to support 32 bit architecture at all. The problem is: Juman++ memory maps analysis model and it can be large: 300m. When embedding this thing into other applications you can run out of virtual memory very fast on 32 bits.

Also, hashing algorithm I use assumes that we have a lot of 64-bit integer registers. That will cause slowdown on x32 (I don't know how much though). Hashing is the cornerstone of the analysis speed by the way.

eiennohito avatar Dec 01 '18 07:12 eiennohito

@eiennohito Yeah, sorry, I suspect yesterday I tested it incorrectly.

Looking at tests:

TEST_CASE("murmur hash works with strings") {
  auto string = "this is a test string";
  auto ptr = reinterpret_cast<const jumanpp::u8*>(string);
  auto hv = murmurhash3_memory(ptr, ptr + sizeof(string), 10);
  CHECK(hv == 0x2e62c68031e4659dULL);
}

Even tests themself assume 64 bit integers, but internally murmur_hash uses size_t so you can easily lose accuracy

DoumanAsh avatar Dec 01 '18 07:12 DoumanAsh

That's a bug, it should use uint64_t. Otherwise models will not be portable.

eiennohito avatar Dec 01 '18 07:12 eiennohito

There are actually lots of warnings about implicit casts between size_t and uint64_t so to be sure I'd suggest to go through all of them... but it is lots to take on, so might take a while

DoumanAsh avatar Dec 01 '18 07:12 DoumanAsh

At least MurmurHash internal hashing representation should use fixed 64 bit integers.

But sized integer hygiene is... not good, yep.

eiennohito avatar Dec 01 '18 07:12 eiennohito

@eiennohito Just letting you know I for now give up with murmurhash3_memory tests, I just do not see where narrowing could happen that starts failing test. I doubt it is critical issue since x64 is recommended to be used, but well maybe someone else will try to tackle it. Or me at later time.

DoumanAsh avatar Dec 05 '18 14:12 DoumanAsh

Thanks for digging. I'm busy with a paper and upcoming PhD thesis right now, so I can't guarantee when I look into it, but I will try.

eiennohito avatar Dec 05 '18 15:12 eiennohito

If your project is sorta largish, I strongly recommend you to use Juman++ via gRPC. You will be protected from crashes and other weird things that way.

https://github.com/eiennohito/jumanpp-grpc

eiennohito avatar Dec 05 '18 15:12 eiennohito