polyscope icon indicating copy to clipboard operation
polyscope copied to clipboard

Running polyscope on remote servers [python]

Open payoto opened this issue 3 years ago • 9 comments

context

I am trying to run the python polyscope library in a headless environment (running some tests in CI); but that has presented some challenges. This issue is the same underlying cause as #82 .

Environment

  • docker container
  • OS: ubuntu
  • Python 3.8.6
  • Polyscope: 0.1.6

polyscope installed with: python -m pip install polyscope

Partial fixes

I worked through a couple of issues but couldn't get everything working (yet):

apt-get install -y libglfw3-dev
apt-get install -y xvfb
export DISPLAY=:99.0
Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &

There error I get when running python -c "import polyscope as ps; ps.init() is the following:

[polyscope] Backend: openGL3_glfw -- Loaded openGL version: 3.3 (Core Profile) Mesa 18.3.6
[polyscope] Polyscope OpenGL Error!  Type: Invalid enum
GLError() after shader compilation! Program text:


// tag ${ GLSL_VERSION }$
// from rule: GLSL_VERSION
#version 330 core

      in vec3 a_position;
      out vec2 tCoord;

      void main()
      {
          tCoord = (a_position.xy+vec2(1.0,1.0))/2.0;
          gl_Position = vec4(a_position,1.);
      }

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/polyscope/core.py", line 12, in init
    psb.init(backend)
RuntimeError: OpenGl error occurred. Text: Invalid enum
[1] + Done(1)                    Xvfb :99 -screen 0 1024x768x24 1>/dev/null 2>&1

I've put the debugging process I've been through so far and would appreciate any tips on where to go next.

Thanks for an already fantastic project! Keep up the great work 😀

Missing glfw

The first error you might encounter when running polyscope in a headless environment is a missing glfw the error looks like issue #82: running the command: python -c "import polyscope as ps; ps.init() throws the following error:

backend = ''

    def init(backend=""):
        """Initialize Polyscope"""
    
        cwd_before = os.getcwd() # see note below
    
>       psb.init(backend)
E       RuntimeError: [polyscope] ERROR: Failed to initialize glfw

/opt/conda/envs/test/lib/python3.8/site-packages/polyscope/core.py:12: RuntimeError

Fix on ubuntu

The fix to that is to install glfw

sudo apt-get install libglfw3

Missing display

re-running the command: python -c "import polyscope as ps; ps.init()" throws the following error:

GLFW emitted error: X11: The DISPLAY environment variable is missing
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/polyscope/core.py", line 12, in init
    psb.init(backend)
RuntimeError: [polyscope] ERROR: Failed to initialize glfw

v

Adding a display (to /dev/null)

apt-get install -y xvfb
export DISPLAY=:99.0
Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &

(credit for this fix to: https://github.com/pyvista/gl-ci-helpers)

OpenGL error

re-running the command: python -c "import polyscope as ps; ps.init()" throws the following error:

[polyscope] Backend: openGL3_glfw -- Loaded openGL version: 3.3 (Core Profile) Mesa 18.3.6
[polyscope] Polyscope OpenGL Error!  Type: Invalid enum
GLError() after shader compilation! Program text:


// tag ${ GLSL_VERSION }$
// from rule: GLSL_VERSION
#version 330 core

      in vec3 a_position;
      out vec2 tCoord;

      void main()
      {
          tCoord = (a_position.xy+vec2(1.0,1.0))/2.0;
          gl_Position = vec4(a_position,1.);
      }

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/polyscope/core.py", line 12, in init
    psb.init(backend)
RuntimeError: OpenGl error occurred. Text: Invalid enum
[1] + Done(1)                    Xvfb :99 -screen 0 1024x768x24 1>/dev/null 2>&1

At this point, I've not found a fix yet.

payoto avatar Mar 26 '21 11:03 payoto

Awesome! Thank you for writing up your progress already, I'm sure this will be super helpful for folks. It would be awesome to add a page on the docs with instructions for running on remote servers with this info once it is all sorted out.

As to next debugging steps, my first guess would have been that this issue is related to https://github.com/nmwsharp/polyscope/issues/70, since you're also using Mesa drivers. But that should already resolved as of Polyscope-py 0.1.6, so if you're running on latest that should not be the case.

I'm not totally sure what the easiest way is to get better debug information in your configuration. The most useful thing would probably be to compile the bindings in Debug mode, but that can be a headache to do on a CI server. I was thinking that in the next release, I might try to add a Python-side flag that enables additional debug checks in the underlying C++ code. Perhaps that would help here as well?

nmwsharp avatar Mar 30 '21 16:03 nmwsharp

If you're up to it, the most actionable thing to do is to try cloning & building the C++ library in debug mode and running unit tests in your CI scripts. If you have a C++ toolchain, that should be as simple as dropping in the commands here http://polyscope.run/building/#tests.

Hopefully that will give a more useful error which will help up debug further.

nmwsharp avatar Mar 30 '21 16:03 nmwsharp

So I followed your suggestions and I got some 'interesting' results, no solutions though. Looking through all this I do now have a trivial solution which is to use the "openGL_mock" backend when I run the test suite (really what I should have been doing all along).

I tested the rest and I got mixed results: no issues with the C++ libraries (all tests pass) and the issue appearing intermittently with the python bindings. I think the root cause is some environment niceties between docker and conda. I'm not convinced it's really a problem with polyscope and I'd understand if you wanted to close the issue.

Building and running the C++ tests

I followed the instructions you sent and successfully built and tested the library, all tests passed with both the "openGL_mock" and "openGL3_glfw" backends.

using

 ./bin/polyscope-test --gtest_catch_exceptions=0 backend=openGL_mock

and

 ./bin/polyscope-test --gtest_catch_exceptions=0 backend=openGL3_glfw

I get 59 tests passing for each.

Python bindings

I then built the python package following the development build instructions: https://polyscope.run/py/installing/#development-builds

running python test/polyscope_test.py backend=openGL3_glfw I got the same error as earlier, all but one time. I'm using conda to manage python packages and I think there is a link there.

Next step

  • I need to find the exact environment change which causes the init to fail.
  • I want to try and reproduce this error in a machine that is not in Docker.
  • I can put a minimal DockerFile to reproduce this error if you are interested.

payoto avatar Apr 06 '21 08:04 payoto

Following up on this issue. I'm facing the exact same set of errors. Everything is the same except I'm not using docker. Is there any update on this? Thanks.

b0ku1 avatar Aug 05 '21 23:08 b0ku1

But it might be my openGL version issue?: glxinfo | grep "version" server glx version string: 1.4 client glx version string: 1.4 GLX version: 1.4 Max core profile version: 3.3 Max compat profile version: 3.1 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.1 OpenGL core profile version string: 3.3 (Core Profile) Mesa 20.0.8 OpenGL core profile shading language version string: 3.30 OpenGL version string: 3.1 Mesa 20.0.8 OpenGL shading language version string: 1.40 OpenGL ES profile version string: OpenGL ES 3.1 Mesa 20.0.8 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10 GL_EXT_shader_implicit_conversions, GL_EXT_shader_integer_mix,

b0ku1 avatar Aug 05 '21 23:08 b0ku1

Hi! Thanks for following up on this again. This is indeed a very important issue, though it has eluded me every time I've tried to dig in!

A quick note: one small thing did change with the most recent Polyscope version: the python bindings now expose a setting to enable error checking around openGL calls. https://polyscope.run/py/basics/program_options/#render-error-checks Enabling this might help generate more useful error messages on the Python side.

For the openGL versions, we do definitely need openGL 3.3 Core (or greater) support, but not anything beyond that. I do not expect that anything will work without 3.3 core.

nmwsharp avatar Aug 07 '21 15:08 nmwsharp

I will try to set up a reproducing case on my end soon and dig deep, but I don't have time immediately! Any info you can share here will only help :)

nmwsharp avatar Aug 07 '21 15:08 nmwsharp

I think it might be an issue when trying to use xvfb on a remote server (xvfb forces using MESA instead of openGL that comes with nvidia driver). So im looking into using x11vnc instead. It might take a longer time to set up. But I'll let you know if that resolves the issue. Thanks. For versioning of openGL, it turns out that the program was probably using 3.3 under xvfb, as from the output from polyscope. I'll try enabling error checking and see what it says.

I think it would be awesome to support headless rendering, but i'm sure how complex that will be.

ps: I'm thinking maybe EGL is more server/headless friendly than glfw? not sure.

b0ku1 avatar Aug 08 '21 05:08 b0ku1

I encountered similar problem, and found that enabling access from non-network local connections work for me: sudo xhost +local:* and when you are done, turn it off: sudo xhost -local:* Also I'm using nvcr.io/nvidia/cudagl:11.3.0-devel-ubuntu20.04 as my base container, which comes with GL drivers and libraries installed.

shenfy avatar Mar 08 '23 06:03 shenfy