mitsuba3 icon indicating copy to clipboard operation
mitsuba3 copied to clipboard

[🐛 bug report] Unable to run multiple processes on a single GPU card

Open Daiqy opened this issue 2 years ago • 5 comments

Summary

Cannot run multiple mitsuba3 processes on one GPU card.

System configuration

System information:

  OS: Ubuntu 20.04 LTS
  CPU: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
  GPU: NVIDIA GeForce RTX 3090
  Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
  NVidia driver: 515.65.01
  CUDA: 11.3.58
  LLVM: 13.0.1

  Dr.Jit: 0.2.2
  Mitsuba: 3.0.2
     Is custom build? False
     Compiled with: GNU 10.2.1
     Variants:
        scalar_rgb
        scalar_spectral
        cuda_ad_rgb
        llvm_ad_rgb

Description

Hi, I am now trying to run multiple Mitsuba3 processes on a single GPU card to achieve parallel rendering (the code is shown in the next section). However, I find that as the number of processes increases, the rendering time for a single process increases as well, but the GPU is not fully occupied. image image image image image

Especially when the number of processes exceeds 25, errors related to the Drjit backend might be reported. image

I am wondering if Mitsuba3 supports running multiple processes on one card at the same time. Is it a bug? Or can you kindly provide some advice on Mitsuba3 multiple-processing rendering?

Thank you and looking forward to your reply!

Steps to reproduce

Here is my multiple-processing code. I used the scene asset provided in the source code of Mitsuba3. I run the code by

CUDA_VISIBLE_DEVICES=0 python test_parallel_drjit.py
import time
from multiprocessing import Pool

def batch_exec(args):
    return args[0](*args[1])

def multi_process_exec(f, args_mat, pool_size):
    if len(args_mat)==0:
        return []
    results=[]
    with Pool(processes=pool_size) as pool:
        imap_it = pool.imap(batch_exec, [(f,args) for args in args_mat])
        for ret in imap_it:
           results.append(ret) 
    return results


def mitsuba_render(a,n):
    import mitsuba as mi
    mi.set_variant("cuda_ad_rgb")

    scene = mi.load_file("./scenes/cbox.xml")

    time1 = time.time()
    image = mi.render(scene)
    image = image.numpy()
    time2 = time.time()

    print("Proc ID: ", a, ", rendering time: ", time2-time1)
    
    return image


if __name__=='__main__':
    num_process = 30
    print("Proc num: ", num_process)
    args_mat = [(i,0) for i in range(num_process)]
    results = multi_process_exec(mitsuba_render, args_mat, num_process)

Daiqy avatar Oct 19 '22 06:10 Daiqy

It might be that all processes use the same CUDA stream, hence synchronisation points will affect all processes. Unfortunately it isn't possible to manually set the stream be process / thread. I will add this to our TODO list but I doubt we will be able to address this issue in the coming weeks.

Speierers avatar Oct 25 '22 08:10 Speierers

Hi, i also encounter this issue when rendering a synthetic dataset using the latest version of Mitsuba3 (v3.1.1). Do you have any plan to address this issue recently?

wylighting avatar Dec 17 '22 16:12 wylighting

(EDIT: I first thought that this might be a windows-related issue but I see now that you are running on Ubuntu).

Could you try if separate runs of the mitsuba executable on the command line can max out the GPU? The goal would be to see if the issue you are reporting is somehow tied to the multiprocessing module.

I don't think CUDA streams mentioned by @Speierers above could be the reason, since those are per-process.

wjakob avatar Dec 18 '22 19:12 wjakob

Hi, I met the same problem. I tried to separately run the mitsuba code WITHOUT using the multiprocessing module, and the problem still remains. If I run only one program, the rendering time is about 0.1s. But the time becomes 0.16s if I run two, and becomes 0.24s if I run three.

I provide my code below:

import mitsuba as mi
import time

while True:
    mi.set_variant("cuda_ad_rgb")

    scene = mi.load_file("./scenes/cbox/cbox.xml")

    time1 = time.time()
    image = mi.render(scene)
    image = image.numpy()
    time2 = time.time()
    print(f"Time: {time2-time1}")

Looking forward to your reply. Thanks!

JYChen18 avatar Feb 06 '23 07:02 JYChen18

Hi @JYChen18

You're most likely maxing out your GPU, or at least enough usage to the point where these processes cannot fully be parallelized. Can you check your GPU usage? It's worth trying with spp=1, that should hopefully be a small enough workload.

njroussel avatar Feb 09 '23 19:02 njroussel

Hi, I met the same problem. I tried to separately run the mitsuba code WITHOUT using the multiprocessing module, and the problem still remains. If I run only one program, the rendering time is about 0.1s. But the time becomes 0.16s if I run two, and becomes 0.24s if I run three.

I provide my code below:

import mitsuba as mi
import time

while True:
    mi.set_variant("cuda_ad_rgb")

    scene = mi.load_file("./scenes/cbox/cbox.xml")

    time1 = time.time()
    image = mi.render(scene)
    image = image.numpy()
    time2 = time.time()
    print(f"Time: {time2-time1}")

Looking forward to your reply. Thanks!

Is there any update on this? Does Mitsuba 3 support the parallel rendering of multiple scenes?

markusheimerl avatar Jun 28 '24 20:06 markusheimerl

I'll close this it's been a while since the original post.

I'll write a short summary below, if you still have open questions feel free to open a new issue/discussion.

When rendering on the GPU, Mitsuba will create its own synchronization points for various reasons. These synchronization points are global to the process, because they use the global/legacy/zero CUDA stream. It's therefore impossible to have two parallel mi.render() calls within the same process. The workaround is to have multiple processes, as their respective synchronizations points will be independent of each other. Of course, this is all conditioned on the fact that these workloads are small enough to actually run in parallel on the GPU.

njroussel avatar Jul 10 '24 08:07 njroussel