SelfReconCode
SelfReconCode copied to clipboard
CUDA out of memory
Hello, I've been training for a while, But an error is reported halfway. Is there any way to solve this problem wiht no changing the graphics card
scene data use female smpl
/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811806235/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
camera ang threshold is 0.010285
box:
[-0.7080196142196655, -1.2795634269714355, -0.3215314447879791]
[0.7120546102523804, 0.7051210403442383, 0.3668109178543091]
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:246: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
stride = (self.resolutions[-1] - 1) // (resolution - 1)
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:261: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
coords_accum = coords // stride
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:341: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
voxels = coords // stride
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:381: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
point_coords = coords // stride
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:417: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
voxels = coords // stride
Traceback (most recent call last):
File "train.py", line 167, in
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xds/project/SelfReconCode/model/network.py", line 502, in forward
total_loss=self.computeTmpPcLoss(defMeshes,[d_cond,[poses,trans]],masks,mgtMs,ratio)
File "/home/xds/project/SelfReconCode/model/network.py", line 687, in computeTmpPcLoss
loss.backward()
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/autograd/function.py", line 199, in apply
return user_fn(self, *args)
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/compositing.py", line 56, in backward
grad_features, grad_alphas = _C.accum_alphacomposite_backward(
RuntimeError: CUDA out of memory. Tried to allocate 668.00 MiB (GPU 0; 10.76 GiB total capacity; 8.00 GiB already allocated; 443.38 MiB free; 8.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The default config requires some memories and a GTX 3090 is recommended. You can change the marching cube resolutions to reduce memory, but the related optimization parameters are also needed to readjust. This is a little tedious.
thank you! I will try to adjust the parameters, hoping to succeed
Do you know how much memory you need
almost 24 Gb
I'm using GeForce RTX 3070 Laptop GPU, and got the same error as below. I edited config.conf a bit; reducing "sample_pix_num", "num_workers", "batch_size", but all in fail. Which parameters should I edit to avoid CUDA out of memory error?
error message
$ CUDA_VISIBLE_DEVICES=0 python train.py --gpu-ids 0 --conf config.conf --data $ROOT/female-3-casual --save-folder result
scene data use female smpl
/home/mas/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811806235/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "train.py", line 98, in
zhihu
import os os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"
Thank you, will try.
I put
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"
at line 27 in train.py, and run
CUDA_VISIBLE_DEVICES=0 python train.py --gpu-ids 0 --conf config.conf --data $ROOT/female-3-casual --save-folder result
But it failed with "Segmentation fault (core dumped)" ...