gdrnpp_bop2022
gdrnpp_bop2022 copied to clipboard
EGL error when starting to training gdrnpp
Training command
(base) root@a3c636c20700:/workspace/gdrnpp_bop2022# CUDA_VISIBLE_DEVICES=0 python ./core/gdrn_modeling/main_gdrn.py --config-file configs/gdrn/tless/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_tless.py --num-gpus 1 --opts MODEL.WEIGHTS=output/gdrn/tless/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_tless/model_final_wo_optim.pth --resume
Error log
20221220_030409|d2.utils.env@41: Using a generated random seed 10091570
20221220_030409|core.utils.default_args_setup@162: Used mmcv backend: cv2
20221220_030409|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so.1 <CDLL 'libEGL.so.1', handle 556faa62b6a0 at 0x7f9a566023d0>
20221220_030409|DBG|OpenGL.platform.ctypesloader@65: Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 556fab0e1ee0 at 0x7f9a566de0d0>
--- Logging error in Loguru Handler #2 ---
Record was: {'elapsed': datetime.timedelta(seconds=5, microseconds=660273), 'exception': (type=<class 'OpenGL.raw.EGL._errors.EGLError'>, value=EGLError( err=EGL_BAD_MATCH (12297), baseOperation = eglCreateContext ), traceback=<traceback object at 0x7f9b05980550>), 'extra': {}, 'file': (name='main_gdrn.py', path='./core/gdrn_modeling/main_gdrn.py'), 'function': '<module>', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '<module>', process 'MainProcess' (132473), thread 'MainThread' (140308821803200):", 'module': 'main_gdrn', 'name': '__main__', 'process': (id=132473, name='MainProcess'), 'thread': (id=140308821803200, name='MainThread'), 'time': datetime(2022, 12, 20, 3, 4, 9, 967196, tzinfo=datetime.timezone(datetime.timedelta(0), 'UTC'))}
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/loguru/_logger.py", line 1226, in catch_wrapper
return function(*args, **kwargs)
File "./core/gdrn_modeling/main_gdrn.py", line 205, in main
).run(args, cfg)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 402, in _run_impl
return run_method(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 409, in _run_with_strategy_setup
return run_method(*args, **kwargs)
File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 290, in get_renderer
use_cache=True,
File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
self._context = OffscreenContext(gpu_id=cuda_device_idx)
File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
self.init_context()
File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 218, in init_context
self._egl_context = eglCreateContext(self._egl_display, configs[0], EGL_NO_CONTEXT, context_attributes)
File "/opt/conda/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
return self( *args, **named )
File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
err = EGL_BAD_MATCH,
baseOperation = eglCreateContext,
cArguments = (
<OpenGL._opaque.EGLDisplay_pointer object at 0x7f9a55df9680>,
<OpenGL._opaque.EGLConfig_pointer object at 0x7f9a55e599e0>,
<OpenGL._opaque.EGLContext_pointer object at 0x7f9a5a804b00>,
<OpenGL.arrays.lists.c_int_Array_7 object at 0x7f9a55e59cb0>,
),
result = <OpenGL._opaque.EGLContext_pointer object at 0x7f9b05ae2680>
)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/loguru/_handler.py", line 175, in emit
self._queue.put(str_record)
File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 358, in put
obj = _ForkingPickler.dumps(obj)
File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/opt/conda/lib/python3.7/site-packages/loguru/_recattrs.py", line 73, in __reduce__
pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---
--- Logging error in Loguru Handler #3 ---
Record was: {'elapsed': datetime.timedelta(seconds=5, microseconds=660273), 'exception': (type=<class 'OpenGL.raw.EGL._errors.EGLError'>, value=EGLError( err=EGL_BAD_MATCH (12297), baseOperation = eglCreateContext ), traceback=<traceback object at 0x7f9b05980550>), 'extra': {}, 'file': (name='main_gdrn.py', path='./core/gdrn_modeling/main_gdrn.py'), 'function': '<module>', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '<module>', process 'MainProcess' (132473), thread 'MainThread' (140308821803200):", 'module': 'main_gdrn', 'name': '__main__', 'process': (id=132473, name='MainProcess'), 'thread': (id=140308821803200, name='MainThread'), 'time': datetime(2022, 12, 20, 3, 4, 9, 967196, tzinfo=datetime.timezone(datetime.timedelta(0), 'UTC'))}
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/loguru/_logger.py", line 1226, in catch_wrapper
return function(*args, **kwargs)
File "./core/gdrn_modeling/main_gdrn.py", line 205, in main
).run(args, cfg)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 402, in _run_impl
return run_method(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 409, in _run_with_strategy_setup
return run_method(*args, **kwargs)
File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 290, in get_renderer
use_cache=True,
File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
self._context = OffscreenContext(gpu_id=cuda_device_idx)
File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
self.init_context()
File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 218, in init_context
self._egl_context = eglCreateContext(self._egl_display, configs[0], EGL_NO_CONTEXT, context_attributes)
File "/opt/conda/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
return self( *args, **named )
File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
err = EGL_BAD_MATCH,
baseOperation = eglCreateContext,
cArguments = (
<OpenGL._opaque.EGLDisplay_pointer object at 0x7f9a55df9680>,
<OpenGL._opaque.EGLConfig_pointer object at 0x7f9a55e599e0>,
<OpenGL._opaque.EGLContext_pointer object at 0x7f9a5a804b00>,
<OpenGL.arrays.lists.c_int_Array_7 object at 0x7f9a55e59cb0>,
),
result = <OpenGL._opaque.EGLContext_pointer object at 0x7f9b05ae2680>
)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/loguru/_handler.py", line 175, in emit
self._queue.put(str_record)
File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 358, in put
obj = _ForkingPickler.dumps(obj)
File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/opt/conda/lib/python3.7/site-packages/loguru/_recattrs.py", line 73, in __reduce__
pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---
I noticed that @wangg12 had solved this error in this comment https://github.com/DLR-RM/AugmentedAutoencoder/issues/19#issuecomment-522597053 Would you mind telling the solution?
I do have a similar issue when running gdrn training in Docker that i don't encounter running on local Ubuntu
20221220_113920|core.utils.default_args_setup@144: Full config saved to output/gdrn/ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv.py 20221220_113920|d2.utils.env@41: Using a generated random seed 22601290 20221220_113920|core.utils.default_args_setup@162: Used mmcv backend: cv2 20221220_113920|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so.1 <CDLL 'libEGL.so.1', handle 6774f80 at 0x7efd3cf29220> 20221220_113920|DBG|OpenGL.platform.ctypesloader@65: Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 71febf0 at 0x7efd3cdb1640> ane tro tro --- Logging error in Loguru Handler #2 --- Record was: {'elapsed': datetime.timedelta(seconds=16, microseconds=268156), 'exception': (type=<class 'OpenGL.raw.EGL.errors.EGLError'>, value=EGLError( err=EGL_BAD_DISPLAY (EGL_BAD_DISPLAY), baseOperation = eglInitialize ), traceback=<traceback object at 0x7efd3cc015c0>), 'extra': {}, 'file': (name='main_gdrn.py', path='core/gdrn_modeling/main_gdrn.py'), 'function': '
', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function ' /lib/python3.8/site-packages/loguru/logger.py", line 1226, in catch_wrapper return function(*args, **kwargs) File "core/gdrn_modeling/main_gdrn.py", line 199, in main Lite( File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 405, in run_impl return run_method(*args, **kwargs) File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 412, in _run_with_strategy_setup return run_method(*args, **kwargs) File "core/gdrn_modeling/main_gdrn.py", line 155, in run renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id) File "/home2/blongo/gdrn1/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 279, in get_renderer ren = EGLRenderer( File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 83, in init self.context = OffscreenContext(gpu_id=cuda_device_idx) File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in init self.init_context() File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 208, in init_context assert eglInitialize(self.egl_display, major, minor) File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 415, in call return self( *args, **named ) File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/OpenGL/error.py", line 230, in glCheckError raise self._errorClass( OpenGL.raw.EGL._errors.EGLError: EGLError( err = EGL_BAD_DISPLAY, baseOperation = eglInitialize, cArguments = ( <OpenGL._opaque.EGLDisplay_pointer object at 0x7efd3cea13c0>, c_long(0), c_long(0), ), result = 0 )', process 'MainProcess' (1769891), thread 'MainThread' (139632321048896):", 'module': 'main_gdrn', 'name': 'main', 'process': (id=1769891, name='MainProcess'), 'thread': (id=139632321048896, name='MainThread'), 'time': datetime(2022, 12, 20, 11, 39, 20, 857957, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600), 'CET'))} Traceback (most recent call last): File "/opt/conda/envs/gdrn
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/conda/envs/gdrn_/lib/python3.8/site-packages/loguru/handler.py", line 175, in emit self.queue.put(str_record) File "/opt/conda/envs/gdrn/lib/python3.8/multiprocessing/queues.py", line 362, in put obj = ForkingPickler.dumps(obj) File "/opt/conda/envs/gdrn/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/opt/conda/envs/gdrn/lib/python3.8/site-packages/loguru/_recattrs.py", line 73, in reduce pickle.dumps(self.value) ValueError: ctypes objects containing pointers cannot be pickled --- End of logging error ---
Maybe you should build egl renderer under the docker environment.
I encountered the same error as @yinguoxiangyi. I built a docker image with ubuntu 18 and cuda 11.3 (nvidia/cuda:11.3.0-cudnn8-devel-ubuntu18.04) . I successfully installed both the dependencies sh scripts/install_deps.sh
and the egl renderer sh compile_cpp_egl_renderer.sh
.
However, when I run
cd gdrnpp_bop2022
python -m lib.egl_renderer.egl_renderer_v3
I get the following error:
libEGL warning: DRI2: failed to create dri screen
libEGL warning: Not allowed to force software rendering when API explicitly selects a hardware device.
libEGL warning: DRI2: failed to create dri screen
Traceback (most recent call last):
File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/egl_renderer_v3.py", line 1422, in <module>
use_cache=True,
File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
self._context = OffscreenContext(gpu_id=cuda_device_idx)
File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
self.init_context()
File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/glutils/egl_offscreen_context.py", line 208, in init_context
assert eglInitialize(self._egl_display, major, minor)
File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
return self( *args, **named )
File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
err = EGL_NOT_INITIALIZED,
baseOperation = eglInitialize,
cArguments = (
<OpenGL._opaque.EGLDisplay_pointer object at 0x7ff44f8fa440>,
c_long(0),
c_long(0),
),
result = 0
)
I provide you the docker image I'm using:
docker pull federicovasile/ubuntu18-cuda11.3-gdrnpp
You can start a container with:
docker run -it --name gdrnpp-workspace -p 6080:6080 --shm-size=8gb --gpus all --privileged -v /dev:/dev -v /YOUR/PATH/HERE/datasets:/root/datasets:ro federicovasile/ubuntu18-cuda11.3-gdrnpp bash
WARNING: disclaimer... please revise the docker run
above before running it. For instance:
- do not abuse
--privileged
(see here) and-v /dev:/dev
- a read-only volume to the datasets folder is created, insert the correct path, e.g.
-v /home/federicovasile/datasets:/root/datasets:ro
-
-p 6080:6080
for VNC client, more info below
When inside the container:
cd /root/pose-estimation/gdrnpp_bop2022
conda activate gdrnpp_bop2022
python -m lib.egl_renderer.egl_renderer_v3
Moreover, the image comes with VNC client providing desktop GUI. When inside the container, run start-vnc-session.sh
then visit localhost:6080
on your browser
@wangg12 @shanice-l thank you for the nice work, I look forward for your help.
PS: given the requests and common interest #14 #12, I'm planning to release to the community the fully working docker image and the complete inference pipeline (YOLOX + GDR-Net)
Hi @FedericoVasile1, have you tried it with a cudagl base image, such as: FROM nvidia/cudagl:11.3.0-devel-ubuntu20.04
cudagl dockers should work, at least on my side.