gdrnpp_bop2022 icon indicating copy to clipboard operation
gdrnpp_bop2022 copied to clipboard

EGL error when starting to training gdrnpp

Open yinguoxiangyi opened this issue 2 years ago • 5 comments

Training command

(base) root@a3c636c20700:/workspace/gdrnpp_bop2022# CUDA_VISIBLE_DEVICES=0 python ./core/gdrn_modeling/main_gdrn.py     --config-file configs/gdrn/tless/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_tless.py --num-gpus 1 --opts MODEL.WEIGHTS=output/gdrn/tless/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_tless/model_final_wo_optim.pth --resume

Error log

20221220_030409|d2.utils.env@41: Using a generated random seed 10091570
20221220_030409|core.utils.default_args_setup@162: Used mmcv backend: cv2
20221220_030409|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so.1 <CDLL 'libEGL.so.1', handle 556faa62b6a0 at 0x7f9a566023d0>
20221220_030409|DBG|OpenGL.platform.ctypesloader@65: Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 556fab0e1ee0 at 0x7f9a566de0d0>
--- Logging error in Loguru Handler #2 ---
Record was: {'elapsed': datetime.timedelta(seconds=5, microseconds=660273), 'exception': (type=<class 'OpenGL.raw.EGL._errors.EGLError'>, value=EGLError( err=EGL_BAD_MATCH (12297), baseOperation = eglCreateContext ), traceback=<traceback object at 0x7f9b05980550>), 'extra': {}, 'file': (name='main_gdrn.py', path='./core/gdrn_modeling/main_gdrn.py'), 'function': '<module>', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '<module>', process 'MainProcess' (132473), thread 'MainThread' (140308821803200):", 'module': 'main_gdrn', 'name': '__main__', 'process': (id=132473, name='MainProcess'), 'thread': (id=140308821803200, name='MainThread'), 'time': datetime(2022, 12, 20, 3, 4, 9, 967196, tzinfo=datetime.timezone(datetime.timedelta(0), 'UTC'))}
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_logger.py", line 1226, in catch_wrapper
    return function(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 205, in main
    ).run(args, cfg)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 402, in _run_impl
    return run_method(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 409, in _run_with_strategy_setup
    return run_method(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
    renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 290, in get_renderer
    use_cache=True,
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 218, in init_context
    self._egl_context = eglCreateContext(self._egl_display, configs[0], EGL_NO_CONTEXT, context_attributes)
  File "/opt/conda/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_BAD_MATCH,
        baseOperation = eglCreateContext,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7f9a55df9680>,
                <OpenGL._opaque.EGLConfig_pointer object at 0x7f9a55e599e0>,
                <OpenGL._opaque.EGLContext_pointer object at 0x7f9a5a804b00>,
                <OpenGL.arrays.lists.c_int_Array_7 object at 0x7f9a55e59cb0>,
        ),
        result = <OpenGL._opaque.EGLContext_pointer object at 0x7f9b05ae2680>
)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_handler.py", line 175, in emit
    self._queue.put(str_record)
  File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 358, in put
    obj = _ForkingPickler.dumps(obj)
  File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/opt/conda/lib/python3.7/site-packages/loguru/_recattrs.py", line 73, in __reduce__
    pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---
--- Logging error in Loguru Handler #3 ---
Record was: {'elapsed': datetime.timedelta(seconds=5, microseconds=660273), 'exception': (type=<class 'OpenGL.raw.EGL._errors.EGLError'>, value=EGLError( err=EGL_BAD_MATCH (12297), baseOperation = eglCreateContext ), traceback=<traceback object at 0x7f9b05980550>), 'extra': {}, 'file': (name='main_gdrn.py', path='./core/gdrn_modeling/main_gdrn.py'), 'function': '<module>', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '<module>', process 'MainProcess' (132473), thread 'MainThread' (140308821803200):", 'module': 'main_gdrn', 'name': '__main__', 'process': (id=132473, name='MainProcess'), 'thread': (id=140308821803200, name='MainThread'), 'time': datetime(2022, 12, 20, 3, 4, 9, 967196, tzinfo=datetime.timezone(datetime.timedelta(0), 'UTC'))}
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_logger.py", line 1226, in catch_wrapper
    return function(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 205, in main
    ).run(args, cfg)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 402, in _run_impl
    return run_method(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 409, in _run_with_strategy_setup
    return run_method(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
    renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 290, in get_renderer
    use_cache=True,
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 218, in init_context
    self._egl_context = eglCreateContext(self._egl_display, configs[0], EGL_NO_CONTEXT, context_attributes)
  File "/opt/conda/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_BAD_MATCH,
        baseOperation = eglCreateContext,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7f9a55df9680>,
                <OpenGL._opaque.EGLConfig_pointer object at 0x7f9a55e599e0>,
                <OpenGL._opaque.EGLContext_pointer object at 0x7f9a5a804b00>,
                <OpenGL.arrays.lists.c_int_Array_7 object at 0x7f9a55e59cb0>,
        ),
        result = <OpenGL._opaque.EGLContext_pointer object at 0x7f9b05ae2680>
)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_handler.py", line 175, in emit
    self._queue.put(str_record)
  File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 358, in put
    obj = _ForkingPickler.dumps(obj)
  File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/opt/conda/lib/python3.7/site-packages/loguru/_recattrs.py", line 73, in __reduce__
    pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---

I noticed that @wangg12 had solved this error in this comment https://github.com/DLR-RM/AugmentedAutoencoder/issues/19#issuecomment-522597053 Would you mind telling the solution?

yinguoxiangyi avatar Dec 20 '22 03:12 yinguoxiangyi

I do have a similar issue when running gdrn training in Docker that i don't encounter running on local Ubuntu

20221220_113920|core.utils.default_args_setup@144: Full config saved to output/gdrn/ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv.py 20221220_113920|d2.utils.env@41: Using a generated random seed 22601290 20221220_113920|core.utils.default_args_setup@162: Used mmcv backend: cv2 20221220_113920|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so.1 <CDLL 'libEGL.so.1', handle 6774f80 at 0x7efd3cf29220> 20221220_113920|DBG|OpenGL.platform.ctypesloader@65: Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 71febf0 at 0x7efd3cdb1640> ane tro tro --- Logging error in Loguru Handler #2 --- Record was: {'elapsed': datetime.timedelta(seconds=16, microseconds=268156), 'exception': (type=<class 'OpenGL.raw.EGL.errors.EGLError'>, value=EGLError( err=EGL_BAD_DISPLAY (EGL_BAD_DISPLAY), baseOperation = eglInitialize ), traceback=<traceback object at 0x7efd3cc015c0>), 'extra': {}, 'file': (name='main_gdrn.py', path='core/gdrn_modeling/main_gdrn.py'), 'function': '', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '', process 'MainProcess' (1769891), thread 'MainThread' (139632321048896):", 'module': 'main_gdrn', 'name': 'main', 'process': (id=1769891, name='MainProcess'), 'thread': (id=139632321048896, name='MainThread'), 'time': datetime(2022, 12, 20, 11, 39, 20, 857957, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600), 'CET'))} Traceback (most recent call last): File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/loguru/logger.py", line 1226, in catch_wrapper return function(*args, **kwargs) File "core/gdrn_modeling/main_gdrn.py", line 199, in main Lite( File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 405, in run_impl return run_method(*args, **kwargs) File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 412, in _run_with_strategy_setup return run_method(*args, **kwargs) File "core/gdrn_modeling/main_gdrn.py", line 155, in run renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id) File "/home2/blongo/gdrn1/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 279, in get_renderer ren = EGLRenderer( File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 83, in init self.context = OffscreenContext(gpu_id=cuda_device_idx) File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in init self.init_context() File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 208, in init_context assert eglInitialize(self.egl_display, major, minor) File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 415, in call return self( *args, **named ) File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/OpenGL/error.py", line 230, in glCheckError raise self._errorClass( OpenGL.raw.EGL._errors.EGLError: EGLError( err = EGL_BAD_DISPLAY, baseOperation = eglInitialize, cArguments = ( <OpenGL._opaque.EGLDisplay_pointer object at 0x7efd3cea13c0>, c_long(0), c_long(0), ), result = 0 )

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/envs/gdrn_/lib/python3.8/site-packages/loguru/handler.py", line 175, in emit self.queue.put(str_record) File "/opt/conda/envs/gdrn/lib/python3.8/multiprocessing/queues.py", line 362, in put obj = ForkingPickler.dumps(obj) File "/opt/conda/envs/gdrn/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/loguru/_recattrs.py", line 73, in reduce pickle.dumps(self.value) ValueError: ctypes objects containing pointers cannot be pickled --- End of logging error ---

Basilel7 avatar Dec 20 '22 11:12 Basilel7

Maybe you should build egl renderer under the docker environment.

shanice-l avatar Jan 29 '23 04:01 shanice-l

I encountered the same error as @yinguoxiangyi. I built a docker image with ubuntu 18 and cuda 11.3 (nvidia/cuda:11.3.0-cudnn8-devel-ubuntu18.04) . I successfully installed both the dependencies sh scripts/install_deps.sh and the egl renderer sh compile_cpp_egl_renderer.sh. However, when I run

cd gdrnpp_bop2022
python -m lib.egl_renderer.egl_renderer_v3

I get the following error:

libEGL warning: DRI2: failed to create dri screen
libEGL warning: Not allowed to force software rendering when API explicitly selects a hardware device.
libEGL warning: DRI2: failed to create dri screen
Traceback (most recent call last):
  File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/egl_renderer_v3.py", line 1422, in <module>
    use_cache=True,
  File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/glutils/egl_offscreen_context.py", line 208, in init_context
    assert eglInitialize(self._egl_display, major, minor)
  File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
	err = EGL_NOT_INITIALIZED,
	baseOperation = eglInitialize,
	cArguments = (
		<OpenGL._opaque.EGLDisplay_pointer object at 0x7ff44f8fa440>,
		c_long(0),
		c_long(0),
	),
	result = 0
)

I provide you the docker image I'm using:

docker pull federicovasile/ubuntu18-cuda11.3-gdrnpp

You can start a container with:

docker run -it --name gdrnpp-workspace -p 6080:6080 --shm-size=8gb --gpus all --privileged -v /dev:/dev -v /YOUR/PATH/HERE/datasets:/root/datasets:ro federicovasile/ubuntu18-cuda11.3-gdrnpp bash

WARNING: disclaimer... please revise the docker run above before running it. For instance:

  • do not abuse --privileged (see here) and -v /dev:/dev
  • a read-only volume to the datasets folder is created, insert the correct path, e.g. -v /home/federicovasile/datasets:/root/datasets:ro
  • -p 6080:6080 for VNC client, more info below

When inside the container:

cd /root/pose-estimation/gdrnpp_bop2022
conda activate gdrnpp_bop2022
python -m lib.egl_renderer.egl_renderer_v3

Moreover, the image comes with VNC client providing desktop GUI. When inside the container, run start-vnc-session.sh then visit localhost:6080 on your browser

@wangg12 @shanice-l thank you for the nice work, I look forward for your help.

PS: given the requests and common interest #14 #12, I'm planning to release to the community the fully working docker image and the complete inference pipeline (YOLOX + GDR-Net)

FedericoVasile1 avatar Mar 27 '23 10:03 FedericoVasile1

Hi @FedericoVasile1, have you tried it with a cudagl base image, such as: FROM nvidia/cudagl:11.3.0-devel-ubuntu20.04

hoenigpeter avatar Apr 03 '23 15:04 hoenigpeter

cudagl dockers should work, at least on my side.

wangg12 avatar Sep 04 '24 00:09 wangg12