Open3D
Open3D copied to clipboard
multi-GPU memory problems when HashSet initialization
Checklist
- [X] I have searched for similar issues.
- [X] For Python issues, I have tested with the latest development wheel.
- [X] I have checked the release documentation and the latest documentation (for
masterbranch).
Describe the issue
I have found some GPU memory problems when HashSet initialization:
I am using open3d 0.15.2 installed through pip
when I have multiple GPUs(tested on 8) and initialize hashset through open3d.core.HashSet, it allocates some GPU memory(about 389M) on each card. You can easily reproduce this problem through the following script 1 (o3d_mem_test.py).
Processes displayed by nvidia-smi: +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 104246 C python3 391MiB | | 1 N/A N/A 104246 C python3 389MiB | | 2 N/A N/A 104246. C python3 389MiB | | 3 N/A N/A 104246 C python3 389MiB | | 4 N/A N/A 104246 C python3 389MiB | | 5 N/A N/A 104246 C python3 389MiB | | 6 N/A N/A 104246. C python3 389MiB | | 7 N/A N/A 104246 C python3 389MiB | +-----------------------------------------------------------------------------+
when I use hashset in multi-process program, a typical usage is with torch.distributed(in this case, use VISIBLE_CUDA_DEVICES won't be an option), it seems like each of the processes taking up some memory of each GPU. For example, when I start 8 processes on 8 GPUs, each process will take about 390M memory on the 8 GPUs, as a result, about 3120M GPU memory be allocated on each GPU. You can easily reproduce this problem through the following script 2(o3d_dist_mem_test.py).
Processes displayed by nvidia-smi: +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 114962 C /home/tops/bin/python3 391MiB | | 0 N/A N/A 114963 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114964 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114965 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114966 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114967 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114968 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114969 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114962 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114963 C /home/tops/bin/python3 391MiB | | 1 N/A N/A 114964 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114965 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114966 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114967 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114968 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114969 C /home/tops/bin/python3 389MiB | | 2 N/A N/A 114962 C /home/tops/bin/python3 389MiB | | 2 N/A N/A 114963 C /home/tops/bin/python3 389MiB | | 2 N/A N/A 114964 C /home/tops/bin/python3 391MiB | ... ... | 6 N/A N/A 114968 C /home/tops/bin/python3 391MiB | | 6 N/A N/A 114969 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114962 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114963 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114964 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114965 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114966 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114967 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114968 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114969 C /home/tops/bin/python3 391MiB | +-----------------------------------------------------------------------------+
Thanks for great open3d, which helped us a lot, I wonder if there is any way to disable this GPU memory allocate? Thanks
Steps to reproduce the bug
# reproduce script 1: o3d_mem_test.py #
import open3d.core as o3c
import time
import subprocess
# python3 o3d_mem_test.py
# each GPU on the machine use about 389M
o3c_device = o3c.Device('CUDA:0')
hashset = o3c.HashSet(init_capacity=1000,
key_dtype=o3c.int64,
key_element_shape=o3c.SizeVector((1,)),
device=o3c_device)
print('hashset init done, use nvidia-smi check gpu mem:')
subprocess.Popen('nvidia-smi', shell=True)
time.sleep(2)
# reproduce script 2: o3d_dist_mem_test.py #
import os
import argparse
import time
import subprocess
import torch
import open3d.core as o3c
# each gpu init 2 processes and use about 782M
# python3 -m torch.distributed.launch --nproc_per_node=2 --master_port=$RANDOM o3d_dist_mem_test.py --gpus '0,1'
# each gpu init 4 processes and use about 1562M
# python3 -m torch.distributed.launch --nproc_per_node=4 --master_port=$RANDOM o3d_dist_mem_test.py --gpus '0,1,2,3'
# each gpu init 8 processes and use about 3120M
# python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=$RANDOM o3d_dist_mem_test.py --gpus '0,1,2,3,4,5,6,7,8'
def main():
parser = argparse.ArgumentParser(description='torch.distributed.launch + Open3D HashSet gpu mem Test')
parser.add_argument("--local_rank", type=int, default=-1, help='auto filled distributed rank(GPU Id)')
parser.add_argument("--gpus", type=str, default='0,1', required=True, help='GPUs to use.')
args = parser.parse_args()
# GPU config
os.environ['CUDA_VISIBLE_DEVICES'] = args.gpus
assert torch.cuda.is_available(), 'torch.cuda.is_available() False'
# distribute init
torch.distributed.init_process_group(backend="nccl", init_method="env://")
assert torch.distributed.get_rank() == args.local_rank, 'args.local_rank != torch.distributed.get_rank()'
device = torch.device("cuda", args.local_rank)
torch.cuda.set_device(device)
print('device:{:s}, init hashset start.'.format(str(device)))
o3c_device = o3c.Device(str(device))
hashset = o3c.HashSet(init_capacity=1000,
key_dtype=o3c.int64,
key_element_shape=o3c.SizeVector((1,)),
device=o3c_device)
print('device:{:s}, hashset init done, use nvidia-smi check gpu mem:'.format(str(device)))
subprocess.Popen('nvidia-smi', shell=True)
time.sleep(20)
if __name__ == '__main__':
main()
Error message
None
Expected behavior
Through like o3c.Device('CUDA:gpu_id'), only use the GPU I specify.
Open3D, Python and System information
- Operating system: Ubuntu 20.04
- Python version: Python 3.8
- Open3D version: 0.15.2
- System architecture: x86
- Is this a remote workstation?: yes
- How did you install Open3D?: pip
Additional information
No response
Hi
I have also problems with multi GPU. Example ./OnlineSLAMRGBD --device CUDA:0 works fine , but when use CUDA:1
it crashes just after processing :
(Open3D_15) ola@dig6:~/Proj2/Open3D_15/Open3D/build/bin/examples$ ./OnlineSLAMRGBD --device CUDA:0 [Open3D INFO] Using device CUDA:0. [Open3D INFO] Using Primesense default intrinsics. FEngine (64 bits) created at 0x7f8275314010 (threading is enabled) FEngine resolved backend: OpenGL [Open3D INFO] Writing reconstruction to scene.ply... [Open3D INFO] Writing trajectory to trajectory.log... (Open3D_15) ola@dig6:~/Proj2/Open3D_15/Open3D/build/bin/examples$ ./OnlineSLAMRGBD --device CUDA:1 [Open3D INFO] Using device CUDA:1. [Open3D INFO] Using Primesense default intrinsics. FEngine (64 bits) created at 0x7f584c2d5010 (threading is enabled) FEngine resolved backend: OpenGL terminate called after throwing an instance of 'thrust::system::system_error' what(): tabulate: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped) (Open3D_15) ola@dig6:~/Proj2/Open3D_15/Open3D/build/bin/examples$
System :
CUDA 11.4 , Ubuntu 20.04 , Open3D 15.2 , c++ , build from source.
Kind regards Ola
I've met a similar multi gpu problem. I can't force open3d to used only one gpu. I wondered have you solved this?