plotoptix icon indicating copy to clipboard operation
plotoptix copied to clipboard

Setup Error: PathTracer destructor failed.

Open jszym opened this issue 4 years ago • 9 comments

Hi there!

I seem to be running into an issue when trying to run the example in README.md on Windows with CUDA 10.1.

I'm running a dual GPU setup, first GPU is an RX 480 (display) and the other is an RTX 2080 (which I intend to use for rendering). Could this be the issue?

Here's the code I've run and the resulting error:

C:\****\****>python
Python 3.7.5 (default, Oct 31 2019, 15:18:51) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> from plotoptix import TkOptiX
>>>
>>> n = 1000000                                  # 1M points, better not try this with matplotlib
>>> xyz = 3 * (np.random.random((n, 3)) - 0.5)   # random 3D positions
>>> r = 0.02 * np.random.random(n) + 0.002       # random radii
>>>
>>> plot = TkOptiX()
[Py-C# interop]
PathTracer destructor failed.
[ERROR] (MainThread) Initial setup failed, see errors above.
>>> plot.set_data("my plot", xyz, r=r)
[ERROR] (MainThread) Geometry setup failed.
>>> plot.show()
[ERROR] (MainThread) Camera setup failed.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\jszym\Anaconda3\envs\pytorch\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "C:\Users\jszym\Anaconda3\envs\pytorch\lib\site-packages\plotoptix\npoptix.py", line 262, in run
    assert self._is_scene_created, "Scene is not ready, see initialization messages."
AssertionError: Scene is not ready, see initialization messages.

[ERROR] (MainThread) Raytracing output startup timed out.
>>>

Here's the CUDA version

C:\****\****>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243

Here are some details about my system

Field Value
OS Name Microsoft Windows 10 Pro
OS Manufacturer Microsoft Corporation
System Manufacturer Micro-Star International Co., Ltd.
System Model MS-7B09
System Type x64-based PC
Processor AMD Ryzen Threadripper 1950X 16-Core Processor, 3400 Mhz, 32 Logical Processor(s)
BaseBoard Manufacturer Micro-Star International Co., Ltd.
BaseBoard Product X399 GAMING PRO CARBON AC (MS-7B09)
BaseBoard Version 1.0
Installed Physical Memory (RAM) 32.0 GB

GPU 1 details

Field Value
Name Radeon (TM) RX 480 Graphics
Adapter Type AMD Radeon Graphics Processor (0x67DF), Advanced Micro Devices, Inc. compatible
Adapter Description Radeon (TM) RX 480 Graphics
Adapter RAM (1,048,576) bytes
Driver Version 26.20.15002.61

GPU 2 details

Field Value
Name NVIDIA GeForce RTX 2080
Adapter Type GeForce RTX 2080, NVIDIA compatible
Adapter Description NVIDIA GeForce RTX 2080
Adapter RAM (1,048,576) bytes
Driver Version 26.21.14.3200

Also wanted to thank you so much for your work, this is a fantastic project.

jszym avatar May 01 '20 17:05 jszym

Hi Joseph,

Thanks for reporting! Yes, likely the fact of having NVIDIA after Radeon is the point. Configuration of board indexes visible to the package was on my todo list. Let me implement that and we'll see how it works for you. I plan the next release within a week.

robertsulej avatar May 02 '20 10:05 robertsulej

The new release is out, I hope it will fix the problem. Please, try import plotoptix - this should list devices compatible with CUDA and above the minimum compute cap 5.0. I guess in the Radeon-NVIDIA mixed configuration the NVIDIA index should be 0 even if it is placed as the second card, but did not test that. I verified code on GCP with multiple gpu's but all of them were NVIDIA boards.

TkOptix() constructor should select automatically all compatible boards, at any index, but in case you can import successfully and have troubles with the constructor, you can use TkOptix(devices=[x, y]).

Let me know if import works for you now.

Btw. CUDA toolkit is not neccessary to run the package since v0.7.

robertsulej avatar May 13 '20 21:05 robertsulej

I am having the same issue on Linux (KDE Neon).

What's really strange is that sometimes, if I omit the blank line between r = 0.02 * np.random.random(n) + 0.002 and plot = TkOptiX(), it works like a charm! I observed that behavior both in python console and jupyter-lab, but could not reproduce it consistenly...

I also tried to set the device used by the constructor (plot = TkOptix(devices=[0]) but that didn't help.

plotoptix version : 0.8.0 python 3.7.6

FloLangenfeld avatar Jun 11 '20 15:06 FloLangenfeld

Are you also on a dual GPU, mixed AMD/NVIDIA? Or it is a single board setup?

robertsulej avatar Jun 11 '20 16:06 robertsulej

@robertsulej Single NVIDIA RTX2080 (that is detected by plotoptix, by the way)

When I run the 0_try_plotoptix.py example, I get a slighlty more verbose output compared to @jszym :

        [0]: GeForce RTX 2080
        Selected devices: [CUDAOutputBuffer destructor caught exception: CUDA call (cudaFreeHost(reinterpret_cast<void*>(m_host_zcopy_pixels)) ) failed with error: 'out of memory' (CUDAOutputBuffer.h:87)

CUDAOutputBuffer destructor caught exception: CUDA call (cudaFreeHost(reinterpret_cast<void*>(m_host_zcopy_pixels)) ) failed with error: 'out of memory' (CUDAOutputBuffer.h:87)

[Py-C# interop]
OptiX initialization failed.
CUDA call (cudaFree(0) ) failed with error: 'out of memory' (PathTracer.cpp:498)

PathTracer destructor failed.
[ERROR] (MainThread) Initial setup failed, see errors above.
[ERROR] (MainThread) Geometry setup failed.
[ERROR] (MainThread) Camera setup failed.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/florent/anaconda3/envs/plotoptix/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/home/florent/anaconda3/envs/plotoptix/lib/python3.7/site-packages/plotoptix/npoptix.py", line 275, in run
    assert self._is_scene_created, "Scene is not ready, see initialization messages."
AssertionError: Scene is not ready, see initialization messages.

[ERROR] (MainThread) Raytracing output startup timed out.
done

Hope this helps

FloLangenfeld avatar Jun 11 '20 16:06 FloLangenfeld

Thanks! I'll have a look at this part of code and let you know.

robertsulej avatar Jun 11 '20 17:06 robertsulej

I did not manage to reproduce the crash yet. I have a few systems with Ubuntu, but they are Gnome or console only, none is KDE. However, I added more careful error checks in the initialization, where the problem apparently occurs. There should be some more information in the output now. Code is updated in repo, and in a couple of days it will go to the next release, with other updates.

I wonder if anything else is using the device in your configuration. Could you, please, run nvidia-smi and post the output?

From CUDA docs there is not much possibilities to fail at that particular point in code (cudaSetDevice()). It is not necessarily the same problem as the original issue in this thread but it might be! I'll also ask at NVIDIA devs forum.

robertsulej avatar Jun 12 '20 16:06 robertsulej

I made some tests, and you seem right about the GPU usage by other processes. I toggle on and off a few processes from the nvidia-smi output. Killing one process allowed me to run all the examples without the aforementionned error.

Here is the nvidia-smi output that results in crashs:

Tue Jun 16 11:42:34 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    On   | 00000000:01:00.0 Off |                  N/A |
| 18%   41C    P8    12W / 215W |   1217MiB /  7982MiB |     24%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2291      G   /usr/lib/xorg/Xorg                           535MiB |
|    0      3797      G   /usr/bin/kwin_x11                             78MiB |
|    0      3878      G   /usr/bin/plasmashell                          66MiB |
|    0     10854      G   ...uest-channel-token=14431268807316465132    69MiB |
|    0     11922      G   ...AAAAAAAAAAAAAAgAAAAAAAAA --shared-files    86MiB |
|    0     11987      G   ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files   310MiB |
|    0     28677      G   ...quest-channel-token=6474003961983026024    58MiB |
+-----------------------------------------------------------------------------+

And the same output that does not produce a crash on the examples (with Visual Studio Code and Franz shut down; processes 11987 and 28677 are my webbrowser and email client, respectively):

Tue Jun 16 11:35:46 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    On   | 00000000:01:00.0 Off |                  N/A |
| 18%   46C    P8     9W / 215W |   1035MiB /  7982MiB |      7%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2291      G   /usr/lib/xorg/Xorg                           503MiB |
|    0      3797      G   /usr/bin/kwin_x11                             87MiB |
|    0      3878      G   /usr/bin/plasmashell                          66MiB |
|    0     11987      G   ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files   310MiB |
|    0     28677      G   ...quest-channel-token=8765003911945023024    58MiB |
+-----------------------------------------------------------------------------+

Killing only one of these two process is enough to avoir errors. I did not test the other two.

HTH

FloLangenfeld avatar Jun 16 '20 10:06 FloLangenfeld

That's great, thanks so much for the feedback!

@jszym this might be a solution also for your case.

robertsulej avatar Jun 16 '20 10:06 robertsulej

Closing since there was no activity over long time.

robertsulej avatar Jun 04 '23 11:06 robertsulej