Scaffold-GS icon indicating copy to clipboard operation
Scaffold-GS copied to clipboard

SIBR_gaussianViewer_app build and runtime issues - "PyTorch is not linked with support for CUDA devices"

Open wright7 opened this issue 1 year ago • 2 comments

Hello,

I'm trying to build the SIBR_gaussianViewer_app from your sources, but I'm having runtime issues. Let me explain what I've done to build it, let's focus on debug configuration:

  1. I've cloned the repository , I'm using main branch
  2. I've configured, generated and opened the project 2.1. On Windows 11 2.2. CMake GUI 3.29.2, I've added OpenCV_RUNTIME=vc17 entry 2.3. With Visual Studio 2022 LTSC 17.8 (17.8.12) 2.4. Python 3.10 2.5. CUDA 11.8
  3. I've copied src/core/viewer folder from original SIBR repository
  4. I've downloaded libtorch debug version from the link in your README and put it into extlibs/ directory
  5. I've configured both SIBR_gaussianViewer_app and sibr_gaussian project in Visual Studio according to your README
  6. I've build SIBR_gaussianViewer_app (debug configuration) and copied all missing dlls to install directory

What I'm getting with such build is an unhandled exception: Unhandled exception at 0x00007FFE2FE1FABC in SIBR_gaussianViewer_app_d.exe: Microsoft C++ exception: c10::Error at memory location 0x0000009458EFA7D0. at line TORCH_CHECK(p, "PyTorch is not linked with support for ", type, " devices"); From SIBR_viewers\extlibs\libtorch\debug\include\c10\core\impl\DeviceGuardImplInterface.h:318 It's caused by p being null, so device_guard_impl_registry[1] (CUDA is 1) is null. The stacktrace shows that dll being used is torch_cpu.dll.

I tried using other builds of torch 1.10, but that caused fail on loading data from file, which seems to be a library version compatibility issue: opacity_mlp_module = torch::jit::load(opacity_mlp_path, _libtorch_device);

Could you help me with this issue?

wright7 avatar Aug 14 '24 11:08 wright7

Can you successfully compile with the release build?

cskrren avatar Aug 21 '24 06:08 cskrren

Sorry for the delay in responding, I have been on holiday.

Yes, I can successfully compile release build (using release version of libtorch, dlls, etc.) and the results look similar - it's probably the same issue as in debug of torch not being able to use CUDA:

[SIBR] --  INFOS  --:   Initialization of GLFW
[SIBR] --  INFOS  --:   OpenGL Version: 4.6.0 NVIDIA 560.94[major: 4, minor: 6]
[SIBR] --  INFOS  --:   Dataset type:
Number of input Images to read: 279
Number of Cameras set up: 279
LOADSFM: Try to open D:\repos\Scaffold-GS-Official\SIBR_viewers\data\kitchen/sparse/0/points3D.bin
Num 3D pts 241367
[SIBR] --  INFOS  --:   SfM Mesh 'D:\repos\Scaffold-GS-Official\SIBR_viewers\data\kitchen/sparse/0/points3d.bin successfully loaded.  (241367) vertices detected. Init GL ...
[SIBR] --  INFOS  --:   Init GL mesh complete
[SIBR] --  INFOS  --:   Loading models from: D:\repos\Scaffold-GS-Official\SIBR_viewers\ckpt\kitchen/point_cloud//
[SIBR] --  INFOS  --:   opacity_mlp : 1
[SIBR] --  INFOS  --:   cov_mlp : 1
[SIBR] --  INFOS  --:   color_mlp : 1
[SIBR] --  INFOS  --:   embedding_appearance : 0

D:/repos/Scaffold-GS-Official/SIBR_viewers/install/bin\SIBR_gaussianViewer_app.exe (process 46968) exited with code -1073740791.

It seems that sibr::GaussianView::GaussianView's CUDA device count test passes (so CUDA Toolkit recognizes the GPU as CUDA device and returns 1 from cudaGetDeviceCount and then sets the device), but then libtorch does not see this device and crashes.

wright7 avatar Sep 02 '24 13:09 wright7