STANet icon indicating copy to clipboard operation
STANet copied to clipboard

Memory error

Open sudipansaha opened this issue 4 years ago • 5 comments

I get a memory error for both BAM and PAM at line 169 of train.py. Does it really need that much memory or am I missing something?

Traceback (most recent call last): File "train.py", line 169, in miou_current = val(opt, model) File "train.py", line 86, in val score = model.test(val=True) # run inference File "/home/supervisedCD/STANet/models/CDFA_model.py", line 72, in test self.forward() File "/home/supervisedCD/STANet/models/CDFA_model.py", line 90, in forward self.feat_A, self.feat_B = self.netA(self.feat_A,self.feat_B) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/supervisedCD/STANet/models/backbone.py", line 46, in forward x = self.Self_Att(x) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/supervisedCD/STANet/models/BAM.py", line 37, in forward energy = torch.bmm(proj_query, proj_key) # transpose check RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 23.70 GiB total capacity; 825.29 MiB already allocated; 18.55 GiB free; 3.90 GiB reserved in total by PyTorch)

sudipansaha avatar Sep 13 '21 13:09 sudipansaha

I get a memory error for both BAM and PAM at line 169 of train.py. Does it really need that much memory or am I missing something?

Traceback (most recent call last): File "train.py", line 169, in miou_current = val(opt, model) File "train.py", line 86, in val score = model.test(val=True) # run inference File "/home/supervisedCD/STANet/models/CDFA_model.py", line 72, in test self.forward() File "/home/supervisedCD/STANet/models/CDFA_model.py", line 90, in forward self.feat_A, self.feat_B = self.netA(self.feat_A,self.feat_B) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/supervisedCD/STANet/models/backbone.py", line 46, in forward x = self.Self_Att(x) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/supervisedCD/STANet/models/BAM.py", line 37, in forward energy = torch.bmm(proj_query, proj_key) # transpose check RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 23.70 GiB total capacity; 825.29 MiB already allocated; 18.55 GiB free; 3.90 GiB reserved in total by PyTorch)

I also encountered the same problem in the same place recently. Have you solved it now?

weizhiliang0520 avatar Dec 08 '21 06:12 weizhiliang0520

I get a memory error for both BAM and PAM at line 169 of train.py. Does it really need that much memory or am I missing something?

Traceback (most recent call last): File "train.py", line 169, in miou_current = val(opt, model) File "train.py", line 86, in val score = model.test(val=True) # run inference File "/home/supervisedCD/STANet/models/CDFA_model.py", line 72, in test self.forward() File "/home/supervisedCD/STANet/models/CDFA_model.py", line 90, in forward self.feat_A, self.feat_B = self.netA(self.feat_A,self.feat_B) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/supervisedCD/STANet/models/backbone.py", line 46, in forward x = self.Self_Att(x) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/supervisedCD/STANet/models/BAM.py", line 37, in forward energy = torch.bmm(proj_query, proj_key) # transpose check RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 23.70 GiB total capacity; 825.29 MiB already allocated; 18.55 GiB free; 3.90 GiB reserved in total by PyTorch)

I have good news for you and share it after I post a comment message. I can run it! There will be a verification process automatically after one round of training. But there is no cropped code in the verification process, you need to crop the val picture to 256 by yourself before training.

weizhiliang0520 avatar Dec 08 '21 06:12 weizhiliang0520

请问怎么裁剪图片呢,作者给代码了吗?裁剪后的图片如何命名呢?

bsyyhpc avatar Dec 20 '21 06:12 bsyyhpc

你可以看我的仓库,我对图片裁剪进行了复现,裁剪后的图片存放在与原数据文件夹相同的子目录中。https://github.com/kangpeilun/utils-for-img-process/tree/main

kangpeilun avatar Mar 18 '22 04:03 kangpeilun

你可以看我的仓库,我对图片裁剪进行了复现,裁剪后的图片存放在与原数据文件夹相同的子目录中。https://github.com/kangpeilun/utils-for-img-process/tree/main

That's cool.Thank you.

TingfengXian avatar Mar 22 '23 01:03 TingfengXian