polyscope
polyscope copied to clipboard
Running polyscope on remote servers [python]
context
I am trying to run the python polyscope library in a headless environment (running some tests in CI); but that has presented some challenges. This issue is the same underlying cause as #82 .
Environment
- docker container
- OS: ubuntu
- Python 3.8.6
- Polyscope: 0.1.6
polyscope installed with: python -m pip install polyscope
Partial fixes
I worked through a couple of issues but couldn't get everything working (yet):
apt-get install -y libglfw3-dev
apt-get install -y xvfb
export DISPLAY=:99.0
Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &
There error I get when running python -c "import polyscope as ps; ps.init()
is the following:
[polyscope] Backend: openGL3_glfw -- Loaded openGL version: 3.3 (Core Profile) Mesa 18.3.6
[polyscope] Polyscope OpenGL Error! Type: Invalid enum
GLError() after shader compilation! Program text:
// tag ${ GLSL_VERSION }$
// from rule: GLSL_VERSION
#version 330 core
in vec3 a_position;
out vec2 tCoord;
void main()
{
tCoord = (a_position.xy+vec2(1.0,1.0))/2.0;
gl_Position = vec4(a_position,1.);
}
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/polyscope/core.py", line 12, in init
psb.init(backend)
RuntimeError: OpenGl error occurred. Text: Invalid enum
[1] + Done(1) Xvfb :99 -screen 0 1024x768x24 1>/dev/null 2>&1
I've put the debugging process I've been through so far and would appreciate any tips on where to go next.
Thanks for an already fantastic project! Keep up the great work 😀
Missing glfw
The first error you might encounter when running polyscope
in a headless environment is a missing glfw
the error looks like issue #82:
running the command: python -c "import polyscope as ps; ps.init()
throws the following error:
backend = ''
def init(backend=""):
"""Initialize Polyscope"""
cwd_before = os.getcwd() # see note below
> psb.init(backend)
E RuntimeError: [polyscope] ERROR: Failed to initialize glfw
/opt/conda/envs/test/lib/python3.8/site-packages/polyscope/core.py:12: RuntimeError
Fix on ubuntu
The fix to that is to install glfw
sudo apt-get install libglfw3
Missing display
re-running the command: python -c "import polyscope as ps; ps.init()"
throws the following error:
GLFW emitted error: X11: The DISPLAY environment variable is missing
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/polyscope/core.py", line 12, in init
psb.init(backend)
RuntimeError: [polyscope] ERROR: Failed to initialize glfw
v
Adding a display (to /dev/null)
apt-get install -y xvfb
export DISPLAY=:99.0
Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &
(credit for this fix to: https://github.com/pyvista/gl-ci-helpers)
OpenGL error
re-running the command: python -c "import polyscope as ps; ps.init()"
throws the following error:
[polyscope] Backend: openGL3_glfw -- Loaded openGL version: 3.3 (Core Profile) Mesa 18.3.6
[polyscope] Polyscope OpenGL Error! Type: Invalid enum
GLError() after shader compilation! Program text:
// tag ${ GLSL_VERSION }$
// from rule: GLSL_VERSION
#version 330 core
in vec3 a_position;
out vec2 tCoord;
void main()
{
tCoord = (a_position.xy+vec2(1.0,1.0))/2.0;
gl_Position = vec4(a_position,1.);
}
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/polyscope/core.py", line 12, in init
psb.init(backend)
RuntimeError: OpenGl error occurred. Text: Invalid enum
[1] + Done(1) Xvfb :99 -screen 0 1024x768x24 1>/dev/null 2>&1
At this point, I've not found a fix yet.
Awesome! Thank you for writing up your progress already, I'm sure this will be super helpful for folks. It would be awesome to add a page on the docs with instructions for running on remote servers with this info once it is all sorted out.
As to next debugging steps, my first guess would have been that this issue is related to https://github.com/nmwsharp/polyscope/issues/70, since you're also using Mesa drivers. But that should already resolved as of Polyscope-py 0.1.6, so if you're running on latest that should not be the case.
I'm not totally sure what the easiest way is to get better debug information in your configuration. The most useful thing would probably be to compile the bindings in Debug mode, but that can be a headache to do on a CI server. I was thinking that in the next release, I might try to add a Python-side flag that enables additional debug checks in the underlying C++ code. Perhaps that would help here as well?
If you're up to it, the most actionable thing to do is to try cloning & building the C++ library in debug mode and running unit tests in your CI scripts. If you have a C++ toolchain, that should be as simple as dropping in the commands here http://polyscope.run/building/#tests.
Hopefully that will give a more useful error which will help up debug further.
So I followed your suggestions and I got some 'interesting' results, no solutions though. Looking through all this I do now have a trivial solution which is to use the "openGL_mock"
backend when I run the test suite (really what I should have been doing all along).
I tested the rest and I got mixed results: no issues with the C++ libraries (all tests pass) and the issue appearing intermittently with the python bindings. I think the root cause is some environment niceties between docker and conda. I'm not convinced it's really a problem with polyscope
and I'd understand if you wanted to close the issue.
Building and running the C++ tests
I followed the instructions you sent and successfully built and tested the library, all tests passed with both the "openGL_mock"
and "openGL3_glfw"
backends.
using
./bin/polyscope-test --gtest_catch_exceptions=0 backend=openGL_mock
and
./bin/polyscope-test --gtest_catch_exceptions=0 backend=openGL3_glfw
I get 59 tests passing for each.
Python bindings
I then built the python package following the development build instructions: https://polyscope.run/py/installing/#development-builds
running python test/polyscope_test.py backend=openGL3_glfw
I got the same error as earlier, all but one time. I'm using conda to manage python packages and I think there is a link there.
Next step
- I need to find the exact environment change which causes the
init
to fail. - I want to try and reproduce this error in a machine that is not in Docker.
- I can put a minimal DockerFile to reproduce this error if you are interested.
Following up on this issue. I'm facing the exact same set of errors. Everything is the same except I'm not using docker. Is there any update on this? Thanks.
But it might be my openGL version issue?:
glxinfo | grep "version" server glx version string: 1.4 client glx version string: 1.4 GLX version: 1.4 Max core profile version: 3.3 Max compat profile version: 3.1 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.1 OpenGL core profile version string: 3.3 (Core Profile) Mesa 20.0.8 OpenGL core profile shading language version string: 3.30 OpenGL version string: 3.1 Mesa 20.0.8 OpenGL shading language version string: 1.40 OpenGL ES profile version string: OpenGL ES 3.1 Mesa 20.0.8 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10 GL_EXT_shader_implicit_conversions, GL_EXT_shader_integer_mix,
Hi! Thanks for following up on this again. This is indeed a very important issue, though it has eluded me every time I've tried to dig in!
A quick note: one small thing did change with the most recent Polyscope version: the python bindings now expose a setting to enable error checking around openGL calls. https://polyscope.run/py/basics/program_options/#render-error-checks Enabling this might help generate more useful error messages on the Python side.
For the openGL versions, we do definitely need openGL 3.3 Core (or greater) support, but not anything beyond that. I do not expect that anything will work without 3.3 core.
I will try to set up a reproducing case on my end soon and dig deep, but I don't have time immediately! Any info you can share here will only help :)
I think it might be an issue when trying to use xvfb on a remote server (xvfb forces using MESA instead of openGL that comes with nvidia driver). So im looking into using x11vnc instead. It might take a longer time to set up. But I'll let you know if that resolves the issue. Thanks. For versioning of openGL, it turns out that the program was probably using 3.3 under xvfb, as from the output from polyscope. I'll try enabling error checking and see what it says.
I think it would be awesome to support headless rendering, but i'm sure how complex that will be.
ps: I'm thinking maybe EGL is more server/headless friendly than glfw? not sure.
I encountered similar problem, and found that enabling access from non-network local connections work for me:
sudo xhost +local:*
and when you are done, turn it off:
sudo xhost -local:*
Also I'm using nvcr.io/nvidia/cudagl:11.3.0-devel-ubuntu20.04 as my base container, which comes with GL drivers and libraries installed.