threestudio
threestudio copied to clipboard
Run Zero123 under 13GB RAM
Trying to run zero123 on Colab free tier fails because loading the model uses up all 12.7GB of RAM and crashes. Using some techniques to avoid loading the full model into RAM on the way to the GPU will unlock broader use of this exciting model.
Hi, @generatorman. We are actively addressing this issue, and you can refer to this pull request for more details. You can also consider reducing the num_samples_per_ray
to 256
and downsampling the resolution of images by adjusting the width
and height
parameters.
Thank you for the response. The PR you linked to seems related to VRAM usage - the issue I'm facing is with RAM. For example, running the following command quickly uses up 13GB of RAM and crashes, without using any VRAM at all.
!python launch.py --config configs/zero123.yaml --train --gpu 0 system.renderer.num_samples_per_ray=256 data.width=64 data.height=64
So currently it's bottlenecked by RAM usage rather than VRAM usage. Is there any quick fix I could apply?
I think loading the zero123 guidance model requires lots of RAM. To address this, you could consider modify torch.load(..., map_location='cpu')
command to torch.load(..., map_location='cuda:0)
, which could potentially alleviate the memory consumption. Another alternative solution is to load an fp16 model instead of an fp32 model.
@generatorman Honestly, you're going to need plenty of RAM and VRAM to run this kind of model. It's inevitable at this stage. Over time the efficiency of the code will probably improve, but for now, you need a good GPU and a powerful system.
I recommend we close this issue for now.
Any idea what the minimum model would be required?
24GB is not enough, I run it on nvidia A10, it failed as OOM:
return self._call_impl(*args, **kwargs)
File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacty of 22.02 GiB of which 85.19 MiB is free. Process 22828 has 21.93 GiB memory in use. Of the allocated memory 19.16 GiB is allocated by PyTorch, and 299.45 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
It seems 40GB VRAM is enough. I run it on A100 40G successfully. And it shows 32-39G VRAM is used in nvidia-smi output.
| Name | Type | Params
-------------------------------------------------------------
0 | geometry | ImplicitVolume | 12.6 M
1 | material | DiffuseWithPointLightMaterial | 0
2 | background | SolidColorBackground | 0
3 | renderer | NeRFVolumeRenderer | 0
-------------------------------------------------------------
12.6 M Trainable params
0 Non-trainable params
12.6 M Total params
50.450 Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/zero123/[64, 128, 256]_1_clipdrop-background-removal.png_prog0@20231028-091058/save
[INFO] Loading Zero123 ...
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
100%|███████████████████████████████████████| 890M/890M [00:56<00:00, 16.6MiB/s]
[INFO] Loaded Zero123!
/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
Epoch 0: | | 174/? [01:52<00:00, 1.55it/s, train/loss=12.50]Epoch 0: | | 175/? [01:52<00:00, 1.55it/s, train/loss=11.20]Epoch 0: | | 200/? [02:10<00:00, 1.53it/s, train/loss=11.20]