kaolin-wisp icon indicating copy to clipboard operation
kaolin-wisp copied to clipboard

Interactive Training Crashes Immediately

Open saltwick opened this issue 2 years ago • 2 comments

This might be more of an OpenGL setup issue, but it's only occuring for the interactive rendering. I can use the nerf app fine in headless mode, but when I try to use the GUI I get the following error.

X Error of failed request:  BadWindow (invalid Window parameter)
  Major opcode of failed request:  150 (GLX)
  Minor opcode of failed request:  16 (X_GLXVendorPrivate)
  Resource id in failed request:  0x2c00009
  Serial number of failed request:  0
  Current serial number in output stream:  152

I followed the solution in #66 for modifying the window config for the correct openGL version and that got me a step further. I added

config = app.configuration.Configuration()
config.major_version=3
config.minor_version=2
config.profile='core'
window = app.Window(..., config=config)

to wisp/cuda_guard.py and then I encountered another issue where make_default_context() wasn't able to create a context on any of the 1 detected devices and solved that by including __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia before running the python script.

Now when I run the script, a transparent window pops up, the data is loaded, and training starts but immediately crashes with the following error:

[i] Using PYGLFW_IMGUI (GL 2.1)
2023-01-11 19:37:33,079|    INFO| [i] Using PYGLFW_IMGUI (GL 2.1)
[i] Running at 60 frames/second
2023-01-11 19:37:33,111|    INFO| [i] Running at 60 frames/second
Traceback (most recent call last):
  File "app/nerf/main_nerf.py", line 490, in <module>
    app.run()  # Run in interactive mode
  File "/home/ubuntu/nr/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 248, in run
    app.run()   # App clock should always run as frequently as possible (background tasks should not be limited)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/app/__init__.py", line 362, in run
    run(duration, framecount)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/app/__init__.py", line 344, in run
    count = __backend__.process(dt)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/app/window/backends/backend_glfw_imgui.py", line 448, in process
    window.dispatch_event('on_draw', dt)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/app/window/event.py", line 396, in dispatch_event
    if getattr(self, event_type)(*args):
  File "/home/ubuntu/nr/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 527, in on_draw
    self.render()     # Render objects uploaded to GPU
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/nr/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 499, in render
    self._blit_to_gl_renderbuffer(img, depth_img, self.canvas_program, self.cuda_buffer,
  File "/home/ubuntu/nr/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 414, in _blit_to_gl_renderbuffer
    canvas_program.draw(gl.GL_TRIANGLE_STRIP)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/program.py", line 603, in draw
    self.activate()
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/globject.py", line 95, in activate
    self._activate()
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/program.py", line 393, in _activate
    attribute.activate()
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/globject.py", line 95, in activate
    self._activate()
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/variable.py", line 383, in _activate
    gl.glVertexAttribPointer(self.handle, size, gtype, gl.GL_FALSE, stride, offset)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/latebind.py", line 63, in __call__
    return self.wrapperFunction( self.baseFunction, *args, **named )
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/GL/VERSION/GL_2_0.py", line 470, in glVertexAttribPointer
    return baseOperation(
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/latebind.py", line 43, in __call__
    return self._finalCall( *args, **named )
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/wrapper.py", line 1392, in wrapperCall
    raise err
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/wrapper.py", line 1385, in wrapperCall
    result = wrappedOperation( *cArguments )
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/error.py", line 230, in glCheckError
    raise self._errorClass(
OpenGL.error.GLError: GLError(
	err = 1282,
	description = b'invalid operation',
	baseOperation = glVertexAttribPointer,
	pyArgs = (
		0,
		2,
		GL_FLOAT,
		GL_FALSE,
		16,
		c_void_p(None),
	),
	cArgs = (
		0,
		2,
		GL_FLOAT,
		GL_FALSE,
		16,
		c_void_p(None),
	),
	cArguments = (
		0,
		2,
		GL_FLOAT,
		GL_FALSE,
		16,
		c_void_p(None),
	)
)

Has anyone else encountered this?

Setup:

  • Ubuntu 20.04.5 running remotely and connected over RDP
  • CUDA 11.7 / Driver 515.65.01
  • glxinfo -B produces
name of display: :10.0
display: :10  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
   Vendor: Mesa/X.org (0xffffffff)
   Device: llvmpipe (LLVM 12.0.0, 256 bits) (0xffffffff)
   Version: 21.2.6
   Accelerated: no
   Video memory: 61287MB
   Unified memory: no
   Preferred profile: core (0x1)
   Max core profile version: 4.5
   Max compat profile version: 3.1
   Max GLES1 profile version: 1.1
   Max GLES[23] profile version: 3.2
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 12.0.0, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 21.2.6
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.1 Mesa 21.2.6
OpenGL shading language version string: 1.40
OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.2.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

glxgears works completely fine. I can also get the instant-ngp GUI up, but I'm unable to interactively train a model there for other reasons.

saltwick avatar Jan 11 '23 19:01 saltwick

Hi @saltwick, this one indeed sounds an opengl setup issue (The make_default_context() wasn't able to create a context on any of the 1 detected devices is a strong evidence)

  • Is your remote machine connected to some display?
  • Are you able to run any of glumpy's demos? i.e: https://github.com/davidcox/glumpy/blob/master/demos/demo-cube.py

If the answer to both is YES, my next suggestion is to re-install the conda env carefully. We have a pending PR which simplifies the installation, could you give it a try and see if it helps? (you no longer have to build pycuda manually, Wisp is pip installable now): https://github.com/NVIDIAGameWorks/kaolin-wisp/pull/105

EDIT: this PR have been merged into main now

orperel avatar Jan 15 '23 13:01 orperel

Hi again @saltwick! Looking again at #66 it just dawned on me that simply changing the major / minor version in glumpy's app_config.py doesn't actually fix the issue, as the backend ignores the requested versions.

I've issued a new fix with #117, wisp sets the default GL version to 3.3 now. If needed it's also configurable via WispState's renderer.gl_version field (normally you shouldn't worry about that)

orperel avatar Feb 06 '23 12:02 orperel