plotoptix
plotoptix copied to clipboard
Setup Error: PathTracer destructor failed.
Hi there!
I seem to be running into an issue when trying to run the example in README.md
on Windows with CUDA 10.1.
I'm running a dual GPU setup, first GPU is an RX 480 (display) and the other is an RTX 2080 (which I intend to use for rendering). Could this be the issue?
Here's the code I've run and the resulting error:
C:\****\****>python
Python 3.7.5 (default, Oct 31 2019, 15:18:51) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> from plotoptix import TkOptiX
>>>
>>> n = 1000000 # 1M points, better not try this with matplotlib
>>> xyz = 3 * (np.random.random((n, 3)) - 0.5) # random 3D positions
>>> r = 0.02 * np.random.random(n) + 0.002 # random radii
>>>
>>> plot = TkOptiX()
[Py-C# interop]
PathTracer destructor failed.
[ERROR] (MainThread) Initial setup failed, see errors above.
>>> plot.set_data("my plot", xyz, r=r)
[ERROR] (MainThread) Geometry setup failed.
>>> plot.show()
[ERROR] (MainThread) Camera setup failed.
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Users\jszym\Anaconda3\envs\pytorch\lib\threading.py", line 926, in _bootstrap_inner
self.run()
File "C:\Users\jszym\Anaconda3\envs\pytorch\lib\site-packages\plotoptix\npoptix.py", line 262, in run
assert self._is_scene_created, "Scene is not ready, see initialization messages."
AssertionError: Scene is not ready, see initialization messages.
[ERROR] (MainThread) Raytracing output startup timed out.
>>>
Here's the CUDA version
C:\****\****>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243
Here are some details about my system
Field | Value |
---|---|
OS Name | Microsoft Windows 10 Pro |
OS Manufacturer | Microsoft Corporation |
System Manufacturer | Micro-Star International Co., Ltd. |
System Model | MS-7B09 |
System Type | x64-based PC |
Processor | AMD Ryzen Threadripper 1950X 16-Core Processor, 3400 Mhz, 32 Logical Processor(s) |
BaseBoard Manufacturer | Micro-Star International Co., Ltd. |
BaseBoard Product | X399 GAMING PRO CARBON AC (MS-7B09) |
BaseBoard Version | 1.0 |
Installed Physical Memory (RAM) | 32.0 GB |
GPU 1 details
Field | Value |
---|---|
Name | Radeon (TM) RX 480 Graphics |
Adapter Type | AMD Radeon Graphics Processor (0x67DF), Advanced Micro Devices, Inc. compatible |
Adapter Description | Radeon (TM) RX 480 Graphics |
Adapter RAM | (1,048,576) bytes |
Driver Version | 26.20.15002.61 |
GPU 2 details
Field | Value |
---|---|
Name | NVIDIA GeForce RTX 2080 |
Adapter Type | GeForce RTX 2080, NVIDIA compatible |
Adapter Description | NVIDIA GeForce RTX 2080 |
Adapter RAM | (1,048,576) bytes |
Driver Version | 26.21.14.3200 |
Also wanted to thank you so much for your work, this is a fantastic project.
Hi Joseph,
Thanks for reporting! Yes, likely the fact of having NVIDIA after Radeon is the point. Configuration of board indexes visible to the package was on my todo list. Let me implement that and we'll see how it works for you. I plan the next release within a week.
The new release is out, I hope it will fix the problem. Please, try import plotoptix
- this should list devices compatible with CUDA and above the minimum compute cap 5.0. I guess in the Radeon-NVIDIA mixed configuration the NVIDIA index should be 0 even if it is placed as the second card, but did not test that. I verified code on GCP with multiple gpu's but all of them were NVIDIA boards.
TkOptix()
constructor should select automatically all compatible boards, at any index, but in case you can import successfully and have troubles with the constructor, you can use TkOptix(devices=[x, y])
.
Let me know if import works for you now.
Btw. CUDA toolkit is not neccessary to run the package since v0.7.
I am having the same issue on Linux (KDE Neon).
What's really strange is that sometimes, if I omit the blank line between r = 0.02 * np.random.random(n) + 0.002
and plot = TkOptiX()
, it works like a charm! I observed that behavior both in python console and jupyter-lab, but could not reproduce it consistenly...
I also tried to set the device used by the constructor (plot = TkOptix(devices=[0]
) but that didn't help.
plotoptix version : 0.8.0 python 3.7.6
Are you also on a dual GPU, mixed AMD/NVIDIA? Or it is a single board setup?
@robertsulej Single NVIDIA RTX2080 (that is detected by plotoptix, by the way)
When I run the 0_try_plotoptix.py example, I get a slighlty more verbose output compared to @jszym :
[0]: GeForce RTX 2080
Selected devices: [CUDAOutputBuffer destructor caught exception: CUDA call (cudaFreeHost(reinterpret_cast<void*>(m_host_zcopy_pixels)) ) failed with error: 'out of memory' (CUDAOutputBuffer.h:87)
CUDAOutputBuffer destructor caught exception: CUDA call (cudaFreeHost(reinterpret_cast<void*>(m_host_zcopy_pixels)) ) failed with error: 'out of memory' (CUDAOutputBuffer.h:87)
[Py-C# interop]
OptiX initialization failed.
CUDA call (cudaFree(0) ) failed with error: 'out of memory' (PathTracer.cpp:498)
PathTracer destructor failed.
[ERROR] (MainThread) Initial setup failed, see errors above.
[ERROR] (MainThread) Geometry setup failed.
[ERROR] (MainThread) Camera setup failed.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/florent/anaconda3/envs/plotoptix/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/florent/anaconda3/envs/plotoptix/lib/python3.7/site-packages/plotoptix/npoptix.py", line 275, in run
assert self._is_scene_created, "Scene is not ready, see initialization messages."
AssertionError: Scene is not ready, see initialization messages.
[ERROR] (MainThread) Raytracing output startup timed out.
done
Hope this helps
Thanks! I'll have a look at this part of code and let you know.
I did not manage to reproduce the crash yet. I have a few systems with Ubuntu, but they are Gnome or console only, none is KDE. However, I added more careful error checks in the initialization, where the problem apparently occurs. There should be some more information in the output now. Code is updated in repo, and in a couple of days it will go to the next release, with other updates.
I wonder if anything else is using the device in your configuration. Could you, please, run nvidia-smi and post the output?
From CUDA docs there is not much possibilities to fail at that particular point in code (cudaSetDevice()). It is not necessarily the same problem as the original issue in this thread but it might be! I'll also ask at NVIDIA devs forum.
I made some tests, and you seem right about the GPU usage by other processes. I toggle on and off a few processes from the nvidia-smi output. Killing one process allowed me to run all the examples without the aforementionned error.
Here is the nvidia-smi output that results in crashs:
Tue Jun 16 11:42:34 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 On | 00000000:01:00.0 Off | N/A |
| 18% 41C P8 12W / 215W | 1217MiB / 7982MiB | 24% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2291 G /usr/lib/xorg/Xorg 535MiB |
| 0 3797 G /usr/bin/kwin_x11 78MiB |
| 0 3878 G /usr/bin/plasmashell 66MiB |
| 0 10854 G ...uest-channel-token=14431268807316465132 69MiB |
| 0 11922 G ...AAAAAAAAAAAAAAgAAAAAAAAA --shared-files 86MiB |
| 0 11987 G ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files 310MiB |
| 0 28677 G ...quest-channel-token=6474003961983026024 58MiB |
+-----------------------------------------------------------------------------+
And the same output that does not produce a crash on the examples (with Visual Studio Code and Franz shut down; processes 11987 and 28677 are my webbrowser and email client, respectively):
Tue Jun 16 11:35:46 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 On | 00000000:01:00.0 Off | N/A |
| 18% 46C P8 9W / 215W | 1035MiB / 7982MiB | 7% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2291 G /usr/lib/xorg/Xorg 503MiB |
| 0 3797 G /usr/bin/kwin_x11 87MiB |
| 0 3878 G /usr/bin/plasmashell 66MiB |
| 0 11987 G ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files 310MiB |
| 0 28677 G ...quest-channel-token=8765003911945023024 58MiB |
+-----------------------------------------------------------------------------+
Killing only one of these two process is enough to avoir errors. I did not test the other two.
HTH
That's great, thanks so much for the feedback!
@jszym this might be a solution also for your case.
Closing since there was no activity over long time.