threestudio Run Zero123 under 13GB RAM

Trying to run zero123 on Colab free tier fails because loading the model uses up all 12.7GB of RAM and crashes. Using some techniques to avoid loading the full model into RAM on the way to the GPU will unlock broader use of this exciting model.

Jun 27 '23 11:06 generatorman

Hi, @generatorman. We are actively addressing this issue, and you can refer to this pull request for more details. You can also consider reducing the num_samples_per_ray to 256 and downsampling the resolution of images by adjusting the width and height parameters.

Jun 27 '23 12:06 DSaurus

Thank you for the response. The PR you linked to seems related to VRAM usage - the issue I'm facing is with RAM. For example, running the following command quickly uses up 13GB of RAM and crashes, without using any VRAM at all.

!python launch.py --config configs/zero123.yaml --train --gpu 0 system.renderer.num_samples_per_ray=256 data.width=64 data.height=64

So currently it's bottlenecked by RAM usage rather than VRAM usage. Is there any quick fix I could apply?

Jun 27 '23 13:06 generatorman

I think loading the zero123 guidance model requires lots of RAM. To address this, you could consider modify torch.load(..., map_location='cpu') command to torch.load(..., map_location='cuda:0), which could potentially alleviate the memory consumption. Another alternative solution is to load an fp16 model instead of an fp32 model.

Jun 27 '23 14:06 DSaurus

@generatorman Honestly, you're going to need plenty of RAM and VRAM to run this kind of model. It's inevitable at this stage. Over time the efficiency of the code will probably improve, but for now, you need a good GPU and a powerful system.

I recommend we close this issue for now.

Jun 30 '23 21:06 claforte

Any idea what the minimum model would be required?

Jul 20 '23 18:07 y22ma

24GB is not enough, I run it on nvidia A10, it failed as OOM:

    return self._call_impl(*args, **kwargs)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacty of 22.02 GiB of which 85.19 MiB is free. Process 22828 has 21.93 GiB memory in use. Of the allocated memory 19.16 GiB is allocated by PyTorch, and 299.45 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Oct 27 '23 19:10 davideuler

It seems 40GB VRAM is enough. I run it on A100 40G successfully. And it shows 32-39G VRAM is used in nvidia-smi output.

 | Name       | Type                          | Params
-------------------------------------------------------------
0 | geometry   | ImplicitVolume                | 12.6 M
1 | material   | DiffuseWithPointLightMaterial | 0
2 | background | SolidColorBackground          | 0
3 | renderer   | NeRFVolumeRenderer            | 0
-------------------------------------------------------------
12.6 M    Trainable params
0         Non-trainable params
12.6 M    Total params
50.450    Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/zero123/[64, 128, 256]_1_clipdrop-background-removal.png_prog0@20231028-091058/save
[INFO] Loading Zero123 ...
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
100%|███████████████████████████████████████| 890M/890M [00:56<00:00, 16.6MiB/s]
[INFO] Loaded Zero123!
/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
Epoch 0: |                                                        | 174/? [01:52<00:00,  1.55it/s, train/loss=12.50]Epoch 0: |                                                        | 175/? [01:52<00:00,  1.55it/s, train/loss=11.20]Epoch 0: |                                                        | 200/? [02:10<00:00,  1.53it/s, train/loss=11.20]

Oct 28 '23 09:10 davideuler

threestudio threestudio copied to clipboard

Run Zero123 under 13GB RAM

threestudio
threestudio copied to clipboard