PyTorchTOP icon indicating copy to clipboard operation
PyTorchTOP copied to clipboard

Error building Pytorch TOP with CMake

Open theflashbacker opened this issue 3 years ago • 19 comments

Hi, Thanks a lot for this! I'm really looking forward to try experimenting with PyTorch within TD. I'm total noob in compiling with CMake, so sorry in advance for Noob mistakes.

I'm stucked just before generating files, and I get these errors:

CMake Error at C:/Program Files/CMake/share/cmake-3.19/Modules/FindCUDA.cmake:1842 (add_library):
  Cannot find source file:

    src/CPlusPlus_Common.h

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm
  .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .ispc
Call Stack (most recent call first):
  CMakeLists.txt:119 (CUDA_ADD_LIBRARY)


CMake Error at C:/Program Files/CMake/share/cmake-3.19/Modules/FindCUDA.cmake:1842 (add_library):
  No SOURCES given to target: PyTorchTOP
Call Stack (most recent call first):
  CMakeLists.txt:119 (CUDA_ADD_LIBRARY)


CMake Generate step failed.  Build files cannot be regenerated correctly.

I add to change a couple of things compare to ReadMe to get there in the PyTorch TOP building process otherwise, I add other errors:

  • I copied the CMakeLists.txt in the build folder
  • I had to unzipp the libtorch folder for CMake or I run in this error:
CMake Error at CMakeLists.txt:126 (find_package):
  By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Torch", but
  CMake did not find one.

  Could not find a package configuration file provided by "Torch" with any of
  the following names:

    TorchConfig.cmake
    torch-config.cmake

  Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
  "Torch_DIR" to a directory containing one of the above files.  If "Torch"
  provides a separate development package or SDK, be sure it has been
  installed.

Maybe this just created my latest issue

I hope you can help and thanks gain for this project

theflashbacker avatar Mar 02 '21 10:03 theflashbacker

You shouldn't copy the CMakeLists.txt file to a new location. My guess is that your build folder is in the wrong location. It should be next to the src folder, the models folder, and PyTorchTOP.toe

You do need to unzip the libtorch download. Suppose you unzip it to C:/libtorch. Then inside the build folder you'll do cmake -DCMAKE_PREFIX_PATH=C:/libtorch .. The double period at the end is necessary too.

DBraun avatar Mar 02 '21 18:03 DBraun

Thanks for the quick feedback. I indeed forgot the double period at the end of the CMake command, which was my problem.

Now I could generate the project but when I build in VisualStudio I get this error: opencv2/cudafeatures2d.hpp' : No such file or directory PyTorchTOP C:\Users\mleth\Documents\Projects\TD Tutorials\PyTorchTOP\src\PyTorchTOP.h 20

Then when opening the PyTorchTOP.toe, I have this error on the cplusplus_background_matting: Error: failed to load the .dll But I have the PyTorchTOP.dll in my plugins folder

When I build openCV I had a lot of warnings and it took a really long time. I wonder if it is related

theflashbacker avatar Mar 04 '21 09:03 theflashbacker

Did you move the opencv_contrib repository after using it in the steps to build opencv? If so, you might need to move it back to its original location. Or you could set an environment variable for its new location set OPENCV_EXTRA_MODULES_PATH=C:/path/to/opencv_contrib/modules Then the same as before: cmake -DCMAKE_PREFIX_PATH=C:/libtorch ..

For context this is the relevant file on my computer: C:\opencv_contrib\modules\cudafeatures2d\include\opencv2\cudafeatures2d.hpp

You can right-click on PyTorchTOP in the Solution Explorer, then "Properties", "C/C++", "General", "Additional Include Directories" to see where it's searching for this cudafeatures2d.hpp. I see C:\opencv_contrib\modules\cudafeatures2d\include listed.

DBraun avatar Mar 04 '21 20:03 DBraun

I didn't move opencv_contrib after building open cv, but it was not appearing in the Solution Explorer so I set the environment variable as you suggested: set OPENCV_EXTRA_MODULES_PATH=C:/path/to/opencv_contrib/modules and it worked.

Unfortunately, I still get an error when TD launch saying: The plugin PyTorchTOP.dll failed to load. This likely means it depends on other .dlls which are missing

I guess it could come from my opencv_world451.dll which would be corrupted since I have so many warnings when I'm building openCV (or whatelse ? )

So I tried building OpenCV one more time (process which takes more around 6 hours on my side), it stays stucked on warning messages like these for really long time:

1>CUSTOMBUILD : nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1>C:/Users/mleth/git libs/opencv/modules/core/include\opencv2/core/types.hpp(530): warning : field of class type without a DLL interface used in a class with a DLL interface
1>
1>C:/Users/mleth/git libs/opencv/modules/core/include\opencv2/core/types.hpp(532): warning : field of class type without a DLL interface used in a class with a DLL interface
1>
1>C:/Users/mleth/git libs/opencv/modules/core/include\opencv2/core/types.hpp(771): warning : field of class type without a DLL interface used in a class with a DLL interface
1>
1>C:/Users/mleth/git libs/opencv/modules/core/include\opencv2/core/mat.hpp(261): warning : field of class type without a DLL interface used in a class with a DLL interface
1>
1>C:/Users/mleth/git libs/opencv/modules/core/include\opencv2/core/mat.hpp(572): warning : field of class type without a DLL interface used in a class with a DLL interface
1>
1>C:/Users/mleth/git libs/opencv/modules/core/include\opencv2/core/mat.hpp(2685): warning : field of class type without a DLL interface used in a class with a DLL interface

then another pile of: The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). opencv_world C:\Users\mleth\git libs\opencv\build\modules\world\CUSTOMBUILD 1

I guess some problem could come from there, would you have any suggestion? Thanks again for your precious help

theflashbacker avatar Mar 05 '21 16:03 theflashbacker

In this path %USERPROFILE%\Documents\Derivative\Plugins do you have both opencv_world451.dll and PyTorchTOP.dll? My opencv_world451.dll is 975 MB so I hope yours is about the same size. What's your GPU and driver?

DBraun avatar Mar 05 '21 18:03 DBraun

Yes I have both .dll file in this folder My opencv_world451.dll file is 1,02 Go

I have a Nvidia GTX 1070 maxQ (laptop version) with Studio driver 461.72 (from february 25th)

Cuda version is actually 11.2, I hope that wouldn't be the problem

theflashbacker avatar Mar 05 '21 22:03 theflashbacker

Hi,

So i restarted from scratch to try not to get mixed up with the versions, and here is the situation with: Cuda Toolkit 11.1 ( somehow the 11.0.x versions are not available to download anymore) libtorch 1.8.0 (stable version for cuda 11.1)

Building OpenCV was much faster (about 1h30) but my generated opencv_world451.dll is only 200 Mo When Building PyTorchTOP, if I use libTorch 1.8.0, I run into 2 errors: 'std' : ambigious symbol

pointing to C:\Users\mleth\AppData\Local\Programs\Python\libtorch-win-shared-with-deps-1.8.0+cu111\libtorch\include\c10\util\C++17.h lines 303 and 305

if constexpr(Condition) {
    if constexpr (detail::function_takes_identity_argument<ThenCallback>::value) {
      return std::forward<ThenCallback>(thenCallback)(detail::_identity());
    } else {
      return std::forward<ThenCallback>(thenCallback)();
    }
  }

which is apparently some problem with VS (which I dont understand how to fix), but prevent it to generate PyTorchTOP.vcxproj

If I use libtorch 1.7.1, I don't get this error but still can't load PyTorch.dll in TD

theflashbacker avatar Mar 08 '21 11:03 theflashbacker

I put some screenshots of my CMake-GUI here https://github.com/DBraun/PyTorchTOP/issues/15#issuecomment-774756385 Can you compare to your setup? I think it's weird that opencv_world451.dll got smaller. Did you include the opencv_modules this time? Also, C:\Users\me\AppData\ is a hidden folder, so it's not a good place to put files in my opinion.

DBraun avatar Mar 09 '21 03:03 DBraun

I compared my CMake-GUI setup to yours and notice fews differences: -CUDA_GENERATION was on "auto", I put it blank as in your setup -OPENCV_ENABLE_NONFREE was checked, i removed it -OPENCV_FORCE_PYTHON_LIBS --> I don't have this option and it is checked in your setup -I didn't have any python 2 installed, I installed it just in case (since I read something about it in the thread you linked)

I change my path for Python and libtorch to be at the root, and made sure opencv_modules was still included it was I'm still running in the same problem

I have one error at the end of the build which says "can't start opencv/build/x64/Release/ALL_BUILD , access refused " And the opencv_world451.dll is still 200 Mo

theflashbacker avatar Mar 09 '21 15:03 theflashbacker

The last part sounds like you're trying to "run" ALL_BUILD but you want to just "build" it. Other than that I'm not sure what's going on.

DBraun avatar Mar 10 '21 02:03 DBraun

ok thank you for your help I'll try other things in the next days and let you know if I get firgure something out

theflashbacker avatar Mar 10 '21 13:03 theflashbacker

In case anyone else finds themselves here, I had some of the same issues compiling the plugin with current available versions: libtorch 1.8.1, CUDA 11.1
I could not compile the PytorchTOP plugin, getting the same error as above in C++17.h 'std' : ambigious symbol
Digging around it's libtorch related, and reported fixed in the nightly preview, which I could build, but did not run the post build command to copy the dll's. Moving these manually, I could launch the .toe without error, but it's not showing any matting. Not yet sure why or how to debug since it's not showing any errors. I have the torchscript .pth files in the models dir and correct bytes per channel. Style transfer branch also shows no errors and no results for the top

kromond avatar Apr 10 '21 01:04 kromond

@kromond Have you disabled the Unload Plugin parameter? image

DBraun avatar Apr 14 '21 03:04 DBraun

I did toggle that and saw no issue (no red x or error). I figured it was the fact that I had different versions of cuda & libtorch, so I started over with CUDA 11.0 and libtorch 1.7.1, which turned out to be not that hard to find by modifying the url that the pytorch web page produces https://download.pytorch.org/libtorch/cu110/libtorch-win-shared-with-deps-1.7.1%2Bcu110.zip Recomplied both openCV and the plugin, without errors except again the build and launch did not run the post command to copy the files to the plugin directory. After doing this manually, the plugin does not want to load. Does this look correct?
image

kromond avatar Apr 14 '21 06:04 kromond

Sorry for seeing this late again.

Have you downloaded a model from the Torchscript folder from the google drive https://github.com/PeterL1n/BackgroundMattingV2#download

Every time you recompile and open TouchDesigner, does it say, this plugin hasn't been loaded yet on this computer etc?

Could you upload a screenshot of TouchDesigner?

In the past I had to copy libtorch's libiomp5md.dll into C:/Program Files/Derivative/TouchDesigner/bin but it stopped being necessary around the time PyTorchTOP was published.

https://github.com/lucasg/Dependencies/releases is also useful for seeing what DLLs a DLL might be dependent on.

DBraun avatar Apr 19 '21 18:04 DBraun

I have downloaded the weights, torchscript version, as above, yes. When I recompile and open TD, yes it says this plugin has not been loaded etc.

I have tried the libiomp5md.dll suggestion but no change. I will have another go and see if there was a step I missed

image

kromond avatar Apr 24 '21 03:04 kromond

https://github.com/lucasg/Dependencies/releases can probably reveal what PyTorchTOP.dll is depending on but can’t find. Maybe some helpful info is in the output of the cmake command that makes the PyTorchTOP.sln.

DBraun avatar Apr 24 '21 04:04 DBraun

I am very excited to report that it worked this time. I am not too sure why. I cleared out the plugins dir and started over. I did not recompile opencv, just the TOP. I figured out why the post command was not copying the dlls over, my user Documents folder is not local and it was not resolving the path. I changed the CMakeLists.txt with explicit path and it compiled and copied this time. When I launched, this time it did not give me the message about the new plugin, which I was surprised about because I deleted the Plugins.json file and it didn't generate a new one. Anyway, it works! Thanks so much for making this and and all the other cool things you make.

Another question, I noticed I can not load the style transfer models onto the plugin. Is that because these are a different format (not torchscript) and I'd need to compile the other branch for this? Does that also mean it's not reasonable to think I could load the MiDaS weights?

kromond avatar Apr 24 '21 04:04 kromond

Great, I'm glad it worked out. Maybe there's something about the CMakeLists.txt that needs to be improved.

For Style Transfer, you should checkout the branch: https://github.com/DBraun/PyTorchTOP/commit/82da157c7c0b04a7b98e6c19e9beb1a7055ba860

You can follow the same instructions from https://github.com/DBraun/PyTorchTOP-cpumem/#neural-style-transfer to make the jit traced pt file. This file is a model that outputs 32-bit RGB, but 3 channel output doesn't work well in CUDA, so I chose to append an alpha channel to make it RGBA. That's this code: https://github.com/DBraun/PyTorchTOP/commit/82da157c7c0b04a7b98e6c19e9beb1a7055ba860#diff-ce3cbc137ad2f17ce74d4ae23975d9b79f599c6650339854e820d9ce1c4b937dR30-R40

It's not necessary for the Style Transfer branch, but for other projects, it's good to add a model wrapper and minimize the work done in C++/libtorch.

class ModelWrapper(torch.nn.Module):

    def __init__(self, model):
        super(ModelWrapper, self).__init__()

        self._model= model

    def preprocess(self, x):

        # Sometimes you'll need to preprocess or normalize your data.
        # ML researchers often put this code in DataLoaders, but you'll want to do it here
        # because it's usually easier to do in Python than C++. TorchScript will also take care of optimizing
        # it for us, whereas my handcrafted C++ code could be unoptimized.
        # Examples of preprocessing might be narrowing the channels from RGBA to RGB,
        # Normalizing with a mean/std, or just between the min/max of the data to 0-1.
        
        # x = something with x

        return x
   
    def forward(self, x):

        x = self.preprocess(x)

        # run the model
        rgb = self._model(x)

        # do the extra stuff so that you don't have to do it in C++ (like adding alpha channel)
        # just taking a guess with this concatenation code
        rgba = torch.cat((rgb , torch.ones((rgb.shape[0], rgb.shape[1], 1)).cuda()), -1)

        return rgba

So instead of

traced_script_module = torch.jit.trace(style_model, content_image)
traced_script_module.save("traced_model.pt")

You can do

wrapper= ModelWrapper(style_model).eval().cuda()
traced_script_module = torch.jit.trace(wrapper, content_image)
traced_script_module.save("traced_model_1.pt")

# torch.jit.script can work too
sm = torch.jit.script(wrapper)
torch.jit.save(sm, "traced_model_2.pt")

I have U-2-Net working in a plugin and it was important/convenient to take care of everything with a wrapper. Also note that if you add a preprocess method in Python, you'll want to remove the preprocess code from the DataLoader and make sure that the output of the wrapper is the same as the output of the combination of the DataLoader with preprocessor and the original model.

Not everything that's a pt file is a jit traced file. You usually have to trace it yourself. The style-transfer branch is actually a better starting point because it doesn't have the opencv/homography stuff. I think the main branch needs a bit of a refactor to be a better starting project for PyTorch TOPs in general.

Some models don't work well in jit.trace and jit.script. They might involve dynamic branching/memory allocation. So if you're going to see if something might work in a plugin you should start with trying to trace it first.

DBraun avatar Apr 24 '21 05:04 DBraun