pytorch3d
pytorch3d copied to clipboard
Very slow rendering for large mesh
Hello, thank you for your work !
Problem I am trying to render many (1024*2048) images from a large mesh. My mesh has ~5M vertices and ~10M faces. The problem is that the rendering is very slow (>30 secs per image) when I think it should be way below 1 sec. My mesh is large (400 Mb) but I know that it should be much faster than this since I can easily visualize it and render it in real time using open3D. I do not care about differentiation here. I have researched all other issues mentionning slow running and tried their solutions but they didn't solve my problem.
Code I ran : The following code simply load my mesh, create a rasterizer with optimized settings and runs the rasterization. I do not include the rendering since I don't really need it yet and the rasterization is the step that seems to be problematic. The 'rasterizer(..)' line takes more than 3 secs on my RTX 3070.
with torch.no_grad():
if torch.cuda.is_available():
device = torch.device("cuda:0")
torch.cuda.set_device(device)
else:
device = torch.device("cpu")
# Get mesh
io = IO()
if isinstance(mesh, str):
logging.info('Trying to read %s as a mesh.', mesh)
if mesh.endswith('.ply'):
mesh = io.load_mesh(path=mesh, device=device)
else:
mesh = mesh.to(device)
meshes = mesh.extend(1)
# Create cameras
cam_poses = np.load(cam_poses_path)
R = cam_poses[0, :3, :3]
T = cam_poses[0, :3, 3]
cameras = FoVPerspectiveCameras(device=device, R=R, T=T)
# Create renderer
logging.info('Creating rasterizer.')
raster_settings = RasterizationSettings(
image_size=(1024, 2048),
blur_radius=0.0,
faces_per_pixel=1,
bin_size=None
)
rasterizer = MeshRasterizer(
cameras=cameras,
raster_settings=raster_settings
)
# Render
logging.info('Rendering.')
fragments = rasterizer(meshes, cameras=cameras)
pix_to_face = fragments.pix_to_face
I think your best option in pytorch3d would be to try MeshRasterizerOpenGL which should be much faster.
Thank you for the answer, I tried to use MeshRasterizerOpenGL but got this error :
Traceback (most recent call last):
File "/home/andrew/jean-zay/projects/ia4markings/scripts/seg_3D_by_2D/proj_pytorch3D.py", line 191, in <module>
main(args)
File "/home/andrew/jean-zay/projects/ia4markings/scripts/seg_3D_by_2D/proj_pytorch3D.py", line 49, in main
run_project_3D_to_2D(mesh=args.input_mesh_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/jean-zay/projects/ia4markings/scripts/seg_3D_by_2D/proj_pytorch3D.py", line 143, in run_project_3D_to_2D
fragments = rasterizer(meshes, cameras=cameras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/rasterizer_opengl.py", line 205, in forward
pix_to_face, bary_coords, zbuf = self.opengl_machinery(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/rasterizer_opengl.py", line 292, in __call__
pix_to_face, bary_coord, zbuf = self._rasterize_mesh(
^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/rasterizer_opengl.py", line 443, in _rasterize_mesh
_torch_to_opengl(face_verts, self.cuda_context, self.cuda_buffer)
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 439, in _torch_to_opengl
cuda_copy(False)
pycuda._driver.LogicError: cuMemcpy2DUnaligned failed: invalid argument
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuGraphicsUnmapResources failed: invalid OpenGL or DirectX context
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuGraphicsUnregisterResource failed: invalid OpenGL or DirectX context
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
Aborted (core dumped)
Have you ever met it with using opengl please ?
I'm sorry I haven't, and I don't know enough to help. Can you run the pytorch3d unit tests and see if the opengl ones work?
Thank you very much for the help !
I tried
python3 -m unittest tests.test_opengl_utils -v
and got :
test_cuda_context (tests.test_opengl_utils.TestDeviceContextStore.test_cuda_context) ... here
ERROR
test_egl_context (tests.test_opengl_utils.TestDeviceContextStore.test_egl_context) ... ERROR
test_multiple_renders_single_gpu_context_store (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_multiple_renders_single_gpu_context_store) ... /home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py:44: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /home/conda/feedstock_root/build_artifacts/libtorch_1706726118919/work/torch/csrc/utils/tensor_new.cpp:1508.)
image = torch.frombuffer(out_buffer, dtype=torch.uint8).reshape(
ok
test_multiple_renders_single_gpu_single_context (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_multiple_renders_single_gpu_single_context) ... ok
test_render_multi_thread_multi_gpu (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_render_multi_thread_multi_gpu) ... In thread 0, device 0.
In thread 1, device 0.
In thread 2, device 0.
In thread 3, device 0.
In thread 4, device 0.
In thread 5, device 0.
In thread 6, device 0.
In thread 7, device 0.
In thread 8, device 0.
In thread 9, device 0.
ok
test_render_two_threads_single_gpu (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_render_two_threads_single_gpu) ... ok
test_render_two_threads_single_gpu_context_store (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_render_two_threads_single_gpu_context_store) ... ok
test_render_two_threads_two_gpus (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_render_two_threads_two_gpus) ... Exception in thread Thread-16 (_draw_squares_with_context):
Traceback (most recent call last):
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 53, in _draw_squares_with_context
context = EGLContext(MAX_EGL_WIDTH, MAX_EGL_HEIGHT, cuda_device_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 187, in __init__
self.device = _get_cuda_device(self.cuda_device_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 121, in _get_cuda_device
raise ValueError(
ValueError: Found 4 CUDA devices, but none with CUDA id 1.
ERROR
test_render_two_threads_two_gpus_context_store (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_render_two_threads_two_gpus_context_store) ... Exception in thread Thread-18 (_draw_squares_with_context_store):
Traceback (most recent call last):
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 72, in _draw_squares_with_context_store
context = global_device_context_store.get_egl_context(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 360, in get_egl_context
self._egl_contexts[cuda_device_id] = EGLContext(
^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 187, in __init__
self.device = _get_cuda_device(self.cuda_device_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 121, in _get_cuda_device
raise ValueError(
ValueError: Found 4 CUDA devices, but none with CUDA id 1.
ERROR
test_draw_square (tests.test_opengl_utils.TestOpenGLSingleThreaded.test_draw_square) ... ok
test_render_two_squares (tests.test_opengl_utils.TestOpenGLSingleThreaded.test_render_two_squares) ... ok
test_device_context_store (tests.test_opengl_utils.TestOpenGLUtils.test_device_context_store) ... EGL could not release context on device cuda:0. This can happen if you created two contexts on the same device. Instead, you can use DeviceContextStore to use a single context per device, and EGLContext.make_(in)active_in_current_thread to (in)activate the context as needed.
ok
test_egl_release_error (tests.test_opengl_utils.TestOpenGLUtils.test_egl_release_error) ... EGL could not release context on device cuda:0. This can happen if you created two contexts on the same device. Instead, you can use DeviceContextStore to use a single context per device, and EGLContext.make_(in)active_in_current_thread to (in)activate the context as needed.
ok
test_no_egl_error (tests.test_opengl_utils.TestOpenGLUtils.test_no_egl_error) ... FAIL
test_egl_convert_to_int_array (tests.test_opengl_utils.TestUtils.test_egl_convert_to_int_array) ... ok
test_get_cuda_device (tests.test_opengl_utils.TestUtils.test_get_cuda_device) ... ok
test_load_extensions (tests.test_opengl_utils.TestUtils.test_load_extensions) ... ok
======================================================================
ERROR: test_cuda_context (tests.test_opengl_utils.TestDeviceContextStore.test_cuda_context)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 95, in test_cuda_context
cuda_context_3 = global_device_context_store.get_cuda_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 339, in get_cuda_context
self._cuda_contexts[cuda_device_id] = _init_cuda_context(cuda_device_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 418, in _init_cuda_context
device = cuda.Device(device_id)
^^^^^^^^^^^^^^^^^^^^^^
pycuda._driver.LogicError: cuDeviceGet failed: invalid device ordinal
======================================================================
ERROR: test_egl_context (tests.test_opengl_utils.TestDeviceContextStore.test_egl_context)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 112, in test_egl_context
egl_context_3 = global_device_context_store.get_egl_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 360, in get_egl_context
self._egl_contexts[cuda_device_id] = EGLContext(
^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 187, in __init__
self.device = _get_cuda_device(self.cuda_device_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/pytorch3d/renderer/opengl/opengl_utils.py", line 121, in _get_cuda_device
raise ValueError(
ValueError: Found 4 CUDA devices, but none with CUDA id 1.
======================================================================
ERROR: test_render_two_threads_two_gpus (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_render_two_threads_two_gpus)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 206, in test_render_two_threads_two_gpus
self._render_two_threads_two_gpus(_draw_squares_with_context)
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 284, in _render_two_threads_two_gpus
result[0]["egl"]["context"].address, result[1]["egl"]["context"].address
~~~~~~~~~^^^^^^^
TypeError: 'NoneType' object is not subscriptable
======================================================================
ERROR: test_render_two_threads_two_gpus_context_store (tests.test_opengl_utils.TestOpenGLMultiThreaded.test_render_two_threads_two_gpus_context_store)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 209, in test_render_two_threads_two_gpus_context_store
self._render_two_threads_two_gpus(_draw_squares_with_context_store)
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 284, in _render_two_threads_two_gpus
result[0]["egl"]["context"].address, result[1]["egl"]["context"].address
~~~~~~~~~^^^^^^^
TypeError: 'NoneType' object is not subscriptable
======================================================================
FAIL: test_no_egl_error (tests.test_opengl_utils.TestOpenGLUtils.test_no_egl_error)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/andrew/jean-zay/projects/ia4markings/pytorch3d_test/tests/test_opengl_utils.py", line 377, in test_no_egl_error
self.assertFalse(_can_import_egl_and_pycuda())
AssertionError: True is not false
----------------------------------------------------------------------
Ran 17 tests in 1.369s
FAILED (failures=1, errors=4)
Ignoring the multi gpu tests, the only failed test that might be relevant is 'test_no_egl_error' I guess, I'm not sure what to do with this result yet though.
I'm not so worried about test_no_egl_error. Can you try tests.test_rasterizer please?
Sure, here is what I got :
Using :
python3 -m unittest tests.test_rasterizer -v
Output :
test_compare_rasterizers (tests.test_rasterizer.TestMeshRasterizer.test_compare_rasterizers) ... An exception occurred in telemetry logging.Disabling telemetry to prevent further exceptions.
Traceback (most recent call last):
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/iopath/common/file_io.py", line 946, in __log_tmetry_keys
handler.log_event()
File "/home/andrew/miniconda3/envs/ia4mk_3/lib/python3.11/site-packages/iopath/common/event_logger.py", line 97, in log_event
del self._evt
^^^^^^^^^
AttributeError: 'NativePathHandler' object has no attribute '_evt'
Time to transform: 0.000804901123046875
Time to set up: 0.004029273986816406
Time to rasterize: 0.0030531883239746094
ok
test_simple_sphere (tests.test_rasterizer.TestMeshRasterizer.test_simple_sphere) ... Time to transform: 0.0007436275482177734
Time to set up: 2.3603439331054688e-05
Time to rasterize: 0.0004773139953613281
Time to transform: 0.0010900497436523438
Time to set up: 3.600120544433594e-05
Time to rasterize: 0.0012364387512207031
Time to transform: 0.0008120536804199219
Time to set up: 3.6716461181640625e-05
Time to rasterize: 0.0005502700805664062
Time to transform: 0.0009360313415527344
Time to set up: 3.528594970703125e-05
Time to rasterize: 0.0005829334259033203
ok
test_simple_sphere_fisheye (tests.test_rasterizer.TestMeshRasterizer.test_simple_sphere_fisheye) ... Time to transform: 0.0007088184356689453
Time to set up: 1.0967254638671875e-05
Time to rasterize: 0.0004017353057861328
Time to transform: 0.015677690505981445
Time to set up: 1.2874603271484375e-05
Time to rasterize: 0.0004665851593017578
Time to transform: 0.001974821090698242
Time to set up: 1.0013580322265625e-05
Time to rasterize: 0.0008959770202636719
ok
test_simple_sphere_opengl (tests.test_rasterizer.TestMeshRasterizer.test_simple_sphere_opengl) ... ok
test_simple_to (tests.test_rasterizer.TestMeshRasterizer.test_simple_to) ... ok
test_check_cameras (tests.test_rasterizer.TestMeshRasterizerOpenGLUtils.test_check_cameras) ... ok
test_check_raster_settings (tests.test_rasterizer.TestMeshRasterizerOpenGLUtils.test_check_raster_settings) ... ok
test_convert_meshes_to_gl_ndc_square_img (tests.test_rasterizer.TestMeshRasterizerOpenGLUtils.test_convert_meshes_to_gl_ndc_square_img) ... ok
test_parse_and_verify_image_size (tests.test_rasterizer.TestMeshRasterizerOpenGLUtils.test_parse_and_verify_image_size) ... ok
test_simple_sphere (tests.test_rasterizer.TestPointRasterizer.test_simple_sphere) ... ok
test_simple_sphere_fisheye_against_perspective (tests.test_rasterizer.TestPointRasterizer.test_simple_sphere_fisheye_against_perspective) ... ok
test_simple_to (tests.test_rasterizer.TestPointRasterizer.test_simple_to) ... ok
----------------------------------------------------------------------
Ran 12 tests in 11.609s
OK
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuGraphicsUnregisterResource failed: invalid OpenGL or DirectX context
By the way, in the meantine I tried to run my code in another file using the cow mesh for comparison, and the code ran without error, so my mesh is the element triggering the error. I'm guessing it's due to its large size ? Is there a standard way to cut a large mesh into parts that could be batched in pytorch3d please ?
Thanks again for the help
You can take a piece of a mesh with submeshes - https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/structures/meshes.py#L1550 .
I wonder if the problem isn't the size but the layout of the vertices? The error is cuMemcpy2DUnaligned failed: invalid argument
on the output of verts_packed(). Can you check that meshes.verts_packed() is actually on the GPU? And maybe see whether it is contiguous?
I added these lines to _torch_to_opengl :
def _torch_to_opengl(torch_tensor, cuda_context, cuda_buffer):
import torch
print("type(torch_tensor):", type(torch_tensor))
print("isinstance(torch_tensor, torch.Tensor):", isinstance(torch_tensor, torch.Tensor))
print("torch_tensor.device:", torch_tensor.device)
print("is contiguous:", torch_tensor.is_contiguous())
and got
type(torch_tensor): <class 'torch.Tensor'>
isinstance(torch_tensor, torch.Tensor): True
torch_tensor.device: cuda:0
is contiguous: True
I also tried to generate a new mesh similar to my previous one but smaller and the code ran successfully, so it would seem that the larger size does trigger the error.