Open3D icon indicating copy to clipboard operation
Open3D copied to clipboard

multi-GPU memory problems when HashSet initialization

Open MingChaoSun opened this issue 3 years ago • 1 comments

Checklist

Describe the issue

I have found some GPU memory problems when HashSet initialization:

I am using open3d 0.15.2 installed through pip

when I have multiple GPUs(tested on 8) and initialize hashset through open3d.core.HashSet, it allocates some GPU memory(about 389M) on each card. You can easily reproduce this problem through the following script 1 (o3d_mem_test.py).

Processes displayed by nvidia-smi: +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 104246 C python3 391MiB | | 1 N/A N/A 104246 C python3 389MiB | | 2 N/A N/A 104246. C python3 389MiB | | 3 N/A N/A 104246 C python3 389MiB | | 4 N/A N/A 104246 C python3 389MiB | | 5 N/A N/A 104246 C python3 389MiB | | 6 N/A N/A 104246. C python3 389MiB | | 7 N/A N/A 104246 C python3 389MiB | +-----------------------------------------------------------------------------+

when I use hashset in multi-process program, a typical usage is with torch.distributed(in this case, use VISIBLE_CUDA_DEVICES won't be an option), it seems like each of the processes taking up some memory of each GPU. For example, when I start 8 processes on 8 GPUs, each process will take about 390M memory on the 8 GPUs, as a result, about 3120M GPU memory be allocated on each GPU. You can easily reproduce this problem through the following script 2(o3d_dist_mem_test.py).

Processes displayed by nvidia-smi: +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 114962 C /home/tops/bin/python3 391MiB | | 0 N/A N/A 114963 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114964 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114965 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114966 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114967 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114968 C /home/tops/bin/python3 389MiB | | 0 N/A N/A 114969 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114962 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114963 C /home/tops/bin/python3 391MiB | | 1 N/A N/A 114964 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114965 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114966 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114967 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114968 C /home/tops/bin/python3 389MiB | | 1 N/A N/A 114969 C /home/tops/bin/python3 389MiB | | 2 N/A N/A 114962 C /home/tops/bin/python3 389MiB | | 2 N/A N/A 114963 C /home/tops/bin/python3 389MiB | | 2 N/A N/A 114964 C /home/tops/bin/python3 391MiB | ... ... | 6 N/A N/A 114968 C /home/tops/bin/python3 391MiB | | 6 N/A N/A 114969 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114962 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114963 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114964 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114965 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114966 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114967 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114968 C /home/tops/bin/python3 389MiB | | 7 N/A N/A 114969 C /home/tops/bin/python3 391MiB | +-----------------------------------------------------------------------------+

Thanks for great open3d, which helped us a lot, I wonder if there is any way to disable this GPU memory allocate? Thanks

Steps to reproduce the bug

# reproduce script 1: o3d_mem_test.py #
import open3d.core as o3c
import time
import subprocess

# python3 o3d_mem_test.py
# each GPU on the machine use about 389M
o3c_device = o3c.Device('CUDA:0')
hashset = o3c.HashSet(init_capacity=1000,
                      key_dtype=o3c.int64,
                      key_element_shape=o3c.SizeVector((1,)),
                      device=o3c_device)

print('hashset init done, use nvidia-smi check gpu mem:')
subprocess.Popen('nvidia-smi', shell=True)
time.sleep(2)

# reproduce script 2: o3d_dist_mem_test.py #
import os
import argparse
import time
import subprocess
import torch
import open3d.core as o3c


# each gpu init 2 processes and use about 782M
# python3 -m torch.distributed.launch --nproc_per_node=2 --master_port=$RANDOM o3d_dist_mem_test.py --gpus '0,1'
# each gpu init 4 processes and use about 1562M
# python3 -m torch.distributed.launch --nproc_per_node=4 --master_port=$RANDOM o3d_dist_mem_test.py --gpus '0,1,2,3'
# each gpu init 8 processes and use about 3120M
# python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=$RANDOM o3d_dist_mem_test.py --gpus '0,1,2,3,4,5,6,7,8'
def main():
    parser = argparse.ArgumentParser(description='torch.distributed.launch + Open3D HashSet gpu mem Test')
    parser.add_argument("--local_rank", type=int, default=-1, help='auto filled distributed rank(GPU Id)')
    parser.add_argument("--gpus", type=str, default='0,1', required=True, help='GPUs to use.')
    args = parser.parse_args()

    # GPU config
    os.environ['CUDA_VISIBLE_DEVICES'] = args.gpus
    assert torch.cuda.is_available(), 'torch.cuda.is_available() False'
    # distribute init
    torch.distributed.init_process_group(backend="nccl", init_method="env://")
    assert torch.distributed.get_rank() == args.local_rank, 'args.local_rank != torch.distributed.get_rank()'
    device = torch.device("cuda", args.local_rank)
    torch.cuda.set_device(device)

    print('device:{:s}, init hashset start.'.format(str(device)))
    o3c_device = o3c.Device(str(device))
    hashset = o3c.HashSet(init_capacity=1000,
                          key_dtype=o3c.int64,
                          key_element_shape=o3c.SizeVector((1,)),
                          device=o3c_device)
    print('device:{:s}, hashset init done, use nvidia-smi check gpu mem:'.format(str(device)))
    subprocess.Popen('nvidia-smi', shell=True)
    time.sleep(20)


if __name__ == '__main__':
    main()

Error message

None

Expected behavior

Through like o3c.Device('CUDA:gpu_id'), only use the GPU I specify.

Open3D, Python and System information

- Operating system: Ubuntu 20.04
- Python version: Python 3.8
- Open3D version: 0.15.2
- System architecture: x86
- Is this a remote workstation?: yes
- How did you install Open3D?: pip

Additional information

No response

MingChaoSun avatar Jun 07 '22 05:06 MingChaoSun

Hi I have also problems with multi GPU. Example ./OnlineSLAMRGBD --device CUDA:0 works fine , but when use CUDA:1 it crashes just after processing : (Open3D_15) ola@dig6:~/Proj2/Open3D_15/Open3D/build/bin/examples$ ./OnlineSLAMRGBD --device CUDA:0 [Open3D INFO] Using device CUDA:0. [Open3D INFO] Using Primesense default intrinsics. FEngine (64 bits) created at 0x7f8275314010 (threading is enabled) FEngine resolved backend: OpenGL [Open3D INFO] Writing reconstruction to scene.ply... [Open3D INFO] Writing trajectory to trajectory.log... (Open3D_15) ola@dig6:~/Proj2/Open3D_15/Open3D/build/bin/examples$ ./OnlineSLAMRGBD --device CUDA:1 [Open3D INFO] Using device CUDA:1. [Open3D INFO] Using Primesense default intrinsics. FEngine (64 bits) created at 0x7f584c2d5010 (threading is enabled) FEngine resolved backend: OpenGL terminate called after throwing an instance of 'thrust::system::system_error' what(): tabulate: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped) (Open3D_15) ola@dig6:~/Proj2/Open3D_15/Open3D/build/bin/examples$ System : CUDA 11.4 , Ubuntu 20.04 , Open3D 15.2 , c++ , build from source.

Kind regards Ola

olagt avatar Sep 06 '22 14:09 olagt

I've met a similar multi gpu problem. I can't force open3d to used only one gpu. I wondered have you solved this?

Buffyqsf avatar Dec 29 '23 06:12 Buffyqsf