TensoRF icon indicating copy to clipboard operation
TensoRF copied to clipboard

Error when running with TensorCP

Open greeneggsandyaml opened this issue 2 years ago • 3 comments

Hello authors, thank you for your great work.

I am trying to train a model on the lego dataset with TensorCP, but it is not working. It trains, but the PSNR does not increase above 9, and then it errors out when updating the alpha mask:

...
initial TV_weight density: 0.0 appearance: 0.0
Iteration 02000: train_psnr = 9.34 test_psnr = 0.00 mse = 0.121434:   7%|█████▎                                                                         | 2000/30000 [00:31<07:25, 62.87it/s]
Traceback (most recent call last):
  File "train.py", line 301, in <module>
    reconstruction(args)
  File "train.py", line 234, in reconstruction
    new_aabb = tensorf.updateAlphaMask(tuple(reso_mask))
  File "/users/lukemk/miniconda3/envs/new/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/tritonhomes/lukemk/projects/experiments/active/TensoRF/models/tensorBase.py", line 330, in updateAlphaMask
    xyz_min = valid_xyz.amin(0)
IndexError: amin(): Expected reduction dim 0 to have non-zero size.

For context, I can run with TensorVMSplit without any issues (and I get the expected PSNR). I probably just missed some parameter that needs to be passed to make TensorCP work.

Thanks again for your work!

greeneggsandyaml avatar Mar 21 '22 16:03 greeneggsandyaml

Oh, forgot to mention that, I have updated the configuration of the CP model, the guidance can refer to README, thanks for pointing out this!

apchenstu avatar Mar 22 '22 03:03 apchenstu

I am getting this error when after uncommenting and running the model. For context, I don't get any memory issues with TensorVM model. Any ideas why?

RuntimeError: CUDA out of memory. Tried to allocate 484.00 MiB (GPU 0; 12.00 GiB total capacity; 8.99 GiB already allocated; 0 bytes free; 9.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

h-OUS-e avatar May 08 '22 16:05 h-OUS-e

Commented back n_lamb_sigma and n_lamb_sh and its working again! not with the same great results though. I guess they need to be lower for a 12gb gpu.

h-OUS-e avatar May 08 '22 16:05 h-OUS-e