threestudio
threestudio copied to clipboard
Stabilize the GPU memory usage
The GPU memory usage during training with high-resolution settings is currently unstable, leading to OOM errors. The primary factor contributing to this issue is the grid_prune function in Nerfacc, resulting in an uncertain number of points in each iteration. To address this issue and stabilize the GPU memory usage, I propose the following methods:
- Limiting the maximum number of sampling points that require gradient computation.
- Dividing the entire rendering image into multiple blocks to stabilize the peak memory usage.
Based on my experiment with NeRF 512x512 rendering, using 6000000 points for gradient calculation consumes about 21GB of memory, while Nerf integration requires about 16 GB. Therefore, my configuration is as follows:
renderer_type: "patch-renderer"
renderer:
mode: "interval"
block_nums: [3,3]
base_renderer_type: "nerf-volume-renderer"
base_renderer:
radius: ${system.geometry.radius}
num_samples_per_ray: 512
train_max_nums: 6000000
By utilizing the proposed configuration, the memory usage will be stabilized at ~23GB (21 + 16/(3*3)).
Here are the results I obtained:
https://github.com/threestudio-project/threestudio/assets/24589363/eb4ab49e-d07b-460e-8b33-c0e3450edf41
I see the current "interval" strategy for the nerf-renderer uses at most B // 2 samples for training. Do you think it could be better that we just randomly select train_max_nums
samples?
I see the current "interval" strategy for the nerf-renderer uses at most B // 2 samples for training. Do you think it could be better that we just randomly select
train_max_nums
samples?
I have also implemented it by randomly selecting a continuous interval that includes train_max_nums
samples. This method yields better results compared to the current implementation and I plan to update it later.
This will be implemented in extensions.