IsaacLab
IsaacLab copied to clipboard
[Question] Implementing a large number of Cameras
Main Challenge
I'm currently trying to setup an environment where there is a large number of cameras, all taking a single picture at a single point in time. When I initially tried to do that, my simulation crashed with the either one of the following error messages:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 23.68 GiB total capacity; 8.44 MiB already allocated; 57.12 MiB free; 22.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
RuntimeError: Array allocation failed on device: cuda:0 for 16777216 bytes
This happens as soon as the number of cloned environments (and therefore cameras) is 10 or larger. If I read the above error message correctly, I should still have plenty of capacity on my GPU, but it's not marked as "free" - how does this happen?
Also, if my math is correct, a single image of an environment should not take up more than 16 MB of space (2048 * 2048 * 4 * 8 bits).
Workaround
I thought then that a possible workaround for this issue could be that I iteratively position the camera over all of the environments, take a picture and add it to a list. This works, but in practice i need to step the simulation a few times (3) every time i reposition the camera for it to "see" something. This time adds up very quickly: Let's say I have 1000 envs and I'm running at 3 fps - it would take 15 minutes to loop over all of them.
Questions
- Is there any way to make this more efficient?
- Am I doing something wrong with the way I implement the parallel cameras?
- Can I allocate more space on the GPU for the camera data?
- Is my GPU just not powerful enough?
Was always, If I can provide more code/data/anything to help solve this, I'm more than happy to.
Example Code
My camera setup for the parallel run (simplified) is as follows:
sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0")
sim = sim_utils.SimulationContext(sim_cfg)
...
camera_cfg = CameraCfg(prim_path="/World/envs/env_.*/Camera_RGB",
update_period=0,
height=2048,
width=2048,
data_types=["rgb"],
spawn=sim_utils.PinholeCameraCfg(focal_length=24.0,
focus_distance=400.0,
horizontal_aperture=20.955,
clipping_range=(0.1, 1.0e5)
),
)
...
while simulation_app.is_running():
...
if count == 20:
sim.pause()
camera = Camera(cfg=camera_cfg)
sim.play()
for _ in range(2):
sim.step()
camera.update(sim_dt)
camera_captures = camera.data.info[0]
camera.__del__()
And my (simplified) setup for iterating through the environments:
sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0")
sim = sim_utils.SimulationContext(sim_cfg)
...
camera_cfg = CameraCfg(prim_path="/Cameras/Camera_RGB",
update_period=0,
height=2048,
width=2048,
data_types=["rgb"],
spawn=sim_utils.PinholeCameraCfg(focal_length=24.0,
focus_distance=400.0,
horizontal_aperture=20.955,
clipping_range=(0.1, 1.0e5)
),
)
...
while simulation_app.is_running():
...
if count == 20:
sim.pause()
camera = Camera(cfg=camera_cfg)
sim.play()
camera_captures = []
for env_index in envs_to_capture:
camera_pos = envs_positions[env_index, :] + camera_offset_pos
camera_rot = camera_offset_rot
camera.set_world_poses(positions=camera_pos.unsqueeze(0),
orientations=camera_rot.unsqueeze(0),
convention="opengl")
for _ in range(2):
sim.step()
camera.update(dt = 0.01)
camera_captures.append(camera.data.info[0])
camera.__del__()
System specs
- Using the devel branch of Orbit
- Commit: aaab27b
- Isaac Sim Version: 2023.1.0-hotfix.1
- OS: Ubuntu 22.04
- GPU: RTX 3090
- CUDA: 12.0
- GPU Driver: 525.147.05
Main Challenge
I'm currently trying to setup an environment where there is a large number of cameras, all taking a single picture at a single point in time. When I initially tried to do that, my simulation crashed with the either one of the following error messages:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 23.68 GiB total capacity; 8.44 MiB already allocated; 57.12 MiB free; 22.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFRuntimeError: Array allocation failed on device: cuda:0 for 16777216 bytesThis happens as soon as the number of cloned environments (and therefore cameras) is 10 or larger. If I read the above error message correctly, I should still have plenty of capacity on my GPU, but it's not marked as "free" - how does this happen?
Also, if my math is correct, a single image of an environment should not take up more than 16 MB of space (2048 * 2048 * 4 * 8 bits).
Workaround
I thought then that a possible workaround for this issue could be that I iteratively position the camera over all of the environments, take a picture and add it to a list. This works, but in practice i need to step the simulation a few times (3) every time i reposition the camera for it to "see" something. This time adds up very quickly: Let's say I have 1000 envs and I'm running at 3 fps - it would take 15 minutes to loop over all of them.
Questions
- Is there any way to make this more efficient?
- Am I doing something wrong with the way I implement the parallel cameras?
- Can I allocate more space on the GPU for the camera data?
- Is my GPU just not powerful enough?
Was always, If I can provide more code/data/anything to help solve this, I'm more than happy to.
Example Code
My camera setup for the parallel run (simplified) is as follows:
sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0") sim = sim_utils.SimulationContext(sim_cfg) ... camera_cfg = CameraCfg(prim_path="/World/envs/env_.*/Camera_RGB", update_period=0, height=2048, width=2048, data_types=["rgb"], spawn=sim_utils.PinholeCameraCfg(focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1.0e5) ), ) ... while simulation_app.is_running(): ... if count == 20: sim.pause() camera = Camera(cfg=camera_cfg) sim.play() for _ in range(2): sim.step() camera.update(sim_dt) camera_captures = camera.data.info[0] camera.__del__()And my (simplified) setup for iterating through the environments:
sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0") sim = sim_utils.SimulationContext(sim_cfg) ... camera_cfg = CameraCfg(prim_path="/Cameras/Camera_RGB", update_period=0, height=2048, width=2048, data_types=["rgb"], spawn=sim_utils.PinholeCameraCfg(focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1.0e5) ), ) ... while simulation_app.is_running(): ... if count == 20: sim.pause() camera = Camera(cfg=camera_cfg) sim.play() camera_captures = [] for env_index in envs_to_capture: camera_pos = envs_positions[env_index, :] + camera_offset_pos camera_rot = camera_offset_rot camera.set_world_poses(positions=camera_pos.unsqueeze(0), orientations=camera_rot.unsqueeze(0), convention="opengl") for _ in range(2): sim.step() camera.update(dt = 0.01) camera_captures.append(camera.data.info[0]) camera.__del__()System specs
- Using the devel branch of Orbit
- Commit: aaab27b
- Isaac Sim Version: 2023.1.0-hotfix.1
- OS: Ubuntu 22.04
- GPU: RTX 3090
- CUDA: 12.0
- GPU Driver: 525.147.05
Hello, I am encountering a similar issue. I believe that initializing the camera alone consumes a substantial amount of GPU memory. I have conducted tests on this; one camera alone appears to use approximately 1 GB of GPU memory. Therefore, it seems impossible to run over 1000 environments when utilizing cameras.
Well at least I'm not the only one. That makes me wonder though where all that memory usage is coming from?
Also, if I do not call camera.update(), the simulation at least does not crash. Even though that does not help with the immediate issue, maybe it's an indication for where the issue is coming from.
I had a similar issue and tried to solve it but failed for various reasons. However, looking at the ORBIT Documentation in the link, I expect that this problem will be solved if “Cameras (parallelized)” is officially updated in “July 2023”. It looks like there's a delay with Isaac-Sim's update. https://isaac-orbit.github.io/orbit/source/refs/roadmap.html
Well, the current devel branch supports parallelized cameras, so I think that feature is implemented already. I think something is just causing the cameras to have a massive vRAM overhead and it's just not feasible right now to have more than (in my case) 9 of them. Would be great to have official confirmation of this though.
Hi everyone,
Thanks for bringing up this discussion. We are going to restart this investigation on multiple cameras again.
At least in our previous benchmarks, we could have two cameras per environment and go up to 8 environments at roughly 40FPS on an RTX 3060. Beyond that, the simulation crashes because of vRAM issues.
I think one of the main factors there was to use the app experience file similar to OIGE:
https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs/blob/main/apps/omni.isaac.sim.python.gym.camera.kit
We are yet to incorporate this into Orbit, but it should be something that can be added. Similar to how we load different experience files in the workflows:
https://github.com/NVIDIA-Omniverse/Orbit/blob/devel/source/standalone/workflows/rsl_rl/train.py#L38-L41
On a side note, yes, the Camera sensor returns batched data. However, this doesn't mean the rendering is happening in parallel. Some work is going on to make this more efficient in Isaac Sim.
Of all of them, the most straightforward is to adapt Orbit's camera internals to optimize the todo note mentioned here:
https://github.com/NVIDIA-Omniverse/Orbit/blob/devel/source/extensions/omni.isaac.orbit/omni/isaac/orbit/sensors/camera/camera.py#L387-L388
This I think should be possible now in Isaac Sim 2023.1. OIGE does some version of it:
https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs/blob/main/omniisaacgymenvs/tasks/cartpole_camera.py#L97-L107
I still have to get around trying this out, but due to my schedule, it has been difficult lately. I would really appreciate help on it if possible.
Hello, I am also facing this problem. Even 2 cameras per env is consuming a lot of mem making any visual RL very difficult if possible at all, even without rendering. Is Nvidia working on it? Can we expect a fix in a very near future? Side question: is anybody doing visual RL using isaac sim?
Resolved in 1.2 release: https://isaac-sim.github.io/IsaacLab/source/overview/reinforcement-learning/performance_benchmarks.html#benchmark-results
Camera benchmark tool for custom environments, I was able to train with 1024 low resolution cameras on an nvidia 3090 laptop gpu : #976