TorchCraftAI icon indicating copy to clipboard operation
TorchCraftAI copied to clipboard

build cherrypi on windows : cmake command fail

Open koalarun opened this issue 6 years ago • 16 comments

I run the full command in the installation guid

cmake .. -DMSVC=true -DZMQ_LIBRARY="../3rdparty/zmq.lib" -DZMQ_INCLUDE_DIR="../3rdparty/libzmq/include" -DGFLAGS_LIBRARY="../3rdparty/gflags_static.lib" -DGFLAGS_INCLUDE_DIR="../3rdparty/gflags/build/include" -DGLOG_ROOT_DIR="../3rdparty/glog" -DCMAKE_CXX_FLAGS_RELEASE="/MP /EHsc" -G "Visual Studio 15 2017 Win64"

and the error happened:

CMake Error in common/CMakeLists.txt:
  Imported target "Torch" includes non-existent path
    "D:/StarCraftAI/TorchCraft/TorchCraftAI/3rdparty/pytorch/torch/lib/tmp_install/include/THC"
  in its INTERFACE_INCLUDE_DIRECTORIES.

image

So the follow command "msbuild CherryPi.sln /property:Configuration=Release /m” is also failled.

koalarun avatar Nov 27 '18 16:11 koalarun

This suggest to me that you somehow built pytorch without CUDA support, while TorchCraftAI thinks you do have CUDA. I wonder if there's something strange with your CUDA installation? Alternatively, try to make sure PyTorch builds with CUDA support.

ebetica avatar Nov 27 '18 19:11 ebetica

This suggest to me that you somehow built pytorch without CUDA support, while TorchCraftAI thinks you do have CUDA. I wonder if there's something strange with your CUDA installation? Alternatively, try to make sure PyTorch builds with CUDA support.

@ebetica I have downloaded and installed CUDA9.2. And I used the command in the installation guid:

conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing

and cd dir pytorch, run

python setup.py build

and after a long time, it was successfull without error.(I installed CUDA10.0 first time, but failled to build the pytorch.)

Did I make some mistake?

koalarun avatar Nov 28 '18 08:11 koalarun

I installed the Patch 1 of cuda9.2, and rebuilded pytorch, It failed... I collected some errors in message:

Library mkl_intel_lp64: not found

Library mkl_intel: not found

CMake Error at cmake/public/cuda.cmake:123 (file): file failed to open for reading (No such file or directory):  \=//cudnn.h

huu~~ it's a hard work. I hope there is a release package easy to install.

koalarun avatar Nov 28 '18 13:11 koalarun

If you can give me your full pytorch and TorchcraftAI build log, I might be able to help in more depth.

The second error you pasted suggests to me you don't have anaconda in your environment, since mkl was installed above in conda install ... mkl ....

ebetica avatar Nov 28 '18 18:11 ebetica

@ebetica I reinstalled the win10 and try again with cuda9.2 , and also installed the pathc1 of cuda9.2. It failed again because of the same error above: mkl not found. I think the problem is not all CUDA version is fit to build the pytorch in the TorchCraftAI. Could you tell me what CUDA version do you installed to build pytorch?

koalarun avatar Dec 02 '18 13:12 koalarun

I want to ask if you have built this environment, I have encountered a lot of problems that cannot be solved.

bwangll avatar Dec 06 '18 05:12 bwangll

Hey, For the windows build, we use the Windows 10 / CUDA 9.2. With / without patches shouldn't matter. Do you also install the anaconda environment? This is where you get conda install and python support.

dexterju27 avatar Dec 06 '18 14:12 dexterju27

@dexterju I did not install CUDA. Is this the main reason for my pytorch error? The reason for its error is 'tool sbui1d pytorch_ 1ibs. Bat --use-fbgemn --use-mnpack. Use-mkldnn -use-qnpack caffe2

bwangll avatar Dec 06 '18 14:12 bwangll

Do you have anaconda installed ? and do the conda install command as described in the tutorial?

Btw what you pasted here is not an error, it just tells you this bat file failed to run, it should give more information on why it fails, do you have it? iIf not, You can go into the script and try to run this line with all environment variables set, and tun this line manually to see what is trigging it.

dexterju27 avatar Dec 06 '18 14:12 dexterju27

@koalarun MKL is the Intel library for fast CPU numerical evaluations. It should not be affected by which version of CUDA you run.

Actually I think the easiest way to set things up is to compile TorchCraftAI on a Linux machine. You can either run OpenBW with a 4.20 bot, or use a windows VM to run StarCraft. We do not really use Windows to develop, so we don't know the kinks of the build process as well as we know the Linux setup process.

ebetica avatar Dec 06 '18 20:12 ebetica

@dexterju @ebetica thank you for your replies. Today I look at the build pytorch error carefully, and I find the cmake error is when build the caffe2:

 "CMake Error at cmake/public/cuda.cmake:123 (file):
  file failed to open for reading (No such file or directory):"

and I open the cuda.cmake file, the code from 120-123 is

if(CAFFE2_USE_CUDNN)
  # Get cuDNN version
  file(READ ${CUDNN_INCLUDE_DIR}/cudnn.h CUDNN_HEADER_CONTENTS)

I doubt the logic in cuda.cmake for check whether use cudnn may be not right?

if(NOT CUDNN_FOUND)
  message(WARNING
    "Caffe2: Cannot find cuDNN library. Turning the option off")
  set(CAFFE2_USE_CUDNN OFF)
else()
  set(CAFFE2_USE_CUDNN ON)
endif()

The parameter CUDNN_FOUND is not defined, and it only appear here. So the CAFFE2_USE_CUDNN is always ON. I have installed CUDA, but I could not find the cudnn.h file in my computer. May be I need to install CUDNN manually. So I use the anaconda to install CUDNN conda install -c anaconda cudnn And copy the bin,include,lib files to CUDA install dir.

Now I run the python setup.py build It's running without error now... I hope it will successful.


huuu~~The newest version of VS2017 is not compatible with CUDA9.2,so I uninstall CUDA9.2, install CUDA10.0 and rebuild again.


Oh, another new error....

  Error : Internal Compiler error (codegen): "there was an error in verifying the lgenfe output!" [D:\StarCraftAI\Torch
Craft\TorchCraftAI\3rdparty\pytorch\build\caffe2\caffe2_gpu.vcxproj]

and there is no more information...

koalarun avatar Dec 08 '18 04:12 koalarun

@koalarun I got the same mistake with you.

bwangll avatar Dec 08 '18 10:12 bwangll

@koalarun I feel that you have some left over build files from the last run. Try python setup.py clean before you rebuild.

ebetica avatar Dec 10 '18 15:12 ebetica

@koalarun I feel that you have some left over build files from the last run. Try python setup.py clean before you rebuild.

I delete the build directory and run the command 'python setup.py clean', but it's useless, the error is also "there was an error in verifying the lgenfe output!".

koalarun avatar Dec 11 '18 14:12 koalarun

Could this explain the issue? https://github.com/pytorch/pytorch/issues/12117

I'm not exactly sure what the fix is, since we don't observe it on our machines... It sounds like 9.2 + older VS2017 is the way to go?

ebetica avatar Dec 12 '18 19:12 ebetica

Could this explain the issue? pytorch/pytorch#12117

I'm not exactly sure what the fix is, since we don't observe it on our machines... It sounds like 9.2 + older VS2017 is the way to go?

May be I'll wait the CUDA10.1 release...

koalarun avatar Dec 13 '18 06:12 koalarun