diffusion_policy icon indicating copy to clipboard operation
diffusion_policy copied to clipboard

Question: virtual environment rendering/acceleration

Open AlbertTan404 opened this issue 1 year ago • 5 comments

Hi there! Thanks for your impressive work and beautiful code :) I tried to run lift_image_abs with transformer hybrid workspace HEADLESS, but it logged that:

[root][INFO] Command '['/mambaforge/envs/robodiff/lib/python3.9/site-packages/egl_probe/build/test_device', '0']' returned non-zero exit status 1.
[root][INFO] - Device 0 is not available for rendering

and it keeps repeating on all of the 4 GPUs. Afterwards, I found the "Eval LiftImage" process is really slow, I wonder if I should turn on or install some driver for hardware acceleration?

nvidia-smi command during Eval (GPU-Util keeps 0%): image

top command during Eval: image

wandb monitor data: image

AlbertTan404 avatar Aug 21 '23 09:08 AlbertTan404

Hi @AlbertTan404, in my experience the eval process is CPU bound, therefore I'm surpsied the find low CPU usage on your system duing eval. I don't have experience dealing with this problem, but I suspect most of the time is spent inside robomimic enviornments.

cheng-chi avatar Sep 07 '23 23:09 cheng-chi

Hi @AlbertTan404, in my experience the eval process is CPU bound, therefore I'm surpsied the find low CPU usage on your system duing eval. I don't have experience dealing with this problem, but I suspect most of the time is spent inside robomimic enviornments.

Thanks for your reply. I'll take a look into the inference process in robomimic env.

AlbertTan404 avatar Sep 08 '23 04:09 AlbertTan404

Hi @AlbertTan404, I recently encoutered similar issue on my machine as well. It turns out to be a bug in recent version of pytorch when installed through conda. https://github.com/pytorch/pytorch/issues/99625 This bug will cause all subprocesses created after import torch to inherit the same CPU Affinity to the first CPU core, which cuases all of dataloader workers and robomimc env workers to be squeezed to the same CPU core, drastically decreasing performance. As described in the pytorch issue, the solution is: conda install llvm-openmp=14

You can check if you are affected by running this script:

import multiprocessing as mp

def print_affinity():
    import time
    import psutil
    p = psutil.Process()
    print('before import torch', p.cpu_affinity())
p = mp.Process(target=print_affinity)
p.start()
p.join()

import torch

def print_affinity():
    import time
    import psutil
    p = psutil.Process()
    print('after import torch', p.cpu_affinity())
p = mp.Process(target=print_affinity)
p.start()
p.join()

This is the result on my machine before and after the fix: Screenshot 2023-09-12 at 23 00 52 Screenshot 2023-09-12 at 23 02 48

I will pin llvm-openmp version in this repo as well.

cheng-chi avatar Sep 13 '23 06:09 cheng-chi

great thanks! I found it significantly boosts the evaluation process. btw, it takes long time in my machine for conda install llvm-openmp=14, while mamba install llvm-openmp=14 works better.

AlbertTan404 avatar Sep 13 '23 14:09 AlbertTan404

@AlbertTan404 Great! I want to keep this issue open so that other people can find it as well.

cheng-chi avatar Sep 13 '23 16:09 cheng-chi