iSeeBetter icon indicating copy to clipboard operation
iSeeBetter copied to clipboard

CUDA out of memory

Open BLCKEAGLE4 opened this issue 3 years ago • 2 comments

Hi, your work is very inspiring! I got errors when i run the train and then the test file. I don't know what I'm doing wrong. I would be grateful if anyone can help.

(venv) amperiad@cuda-pc:~/iSeeBetter-master$ python3 iSeeBetterTrain.py [ INFO] ==> Loading datasets Training samples chosen: foliage_test.txt [ INFO] # of Generator parameters: 12771943 [ INFO] # of Discriminator parameters: 5215425 [ INFO] # of CUDA devices detected: 1 [ INFO] Using CUDA device #: 0 [ INFO] CUDA device name: GeForce GTX TITAN X [ INFO] Generator Loss: L1 Loss [ INFO] ------------- iSeeBetter Network Architecture ------------- [ INFO] ----------------- Generator Architecture ------------------ ...

[ INFO] Total number of parameters: 5215425 [ INFO] ----------------------------------------------------------- 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last): File "iSeeBetterTrain.py", line 264, in main() File "iSeeBetterTrain.py", line 258, in main runningResults = trainModel(epoch, training_data_loader, netG, netD, optimizerD, optimizerG, generatorCriterion, device, args) File "iSeeBetterTrain.py", line 61, in trainModel next(iterTrainBar) File "/home/amperiad/venv/lib/python3.6/site-packages/tqdm/std.py", line 1087, in iter for obj in iterable: File "/home/amperiad/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/amperiad/venv/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) NotADirectoryError: Caught NotADirectoryError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/amperiad/venv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/amperiad/venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/amperiad/iSeeBetter-master/dataset.py", line 204, in getitem target, input, neigbor = load_img_future(self.image_filenames[index], self.nFrames, self.upscale_factor, self.other_dataset,self.upscale_only) File "/home/amperiad/iSeeBetter-master/dataset.py", line 94, in load_img_future target = modcrop(Image.open(join(filepath,'im4.png')).convert('RGB'),scale) File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2548, in open fp = builtins.open(filename, "rb") NotADirectoryError: [Errno 20] Not a directory: './Vid4/foliage/001.png/im4.png'

(venv) amperiad@cuda-pc:~/iSeeBetter-master$ python3 iSeeBetterTest.py -o output.txt -c --data_dir ./Vid4 --file_list foliage_test.txt -u Namespace(chop_forward=False, data_dir='./Vid4', debug=False, file_list='foliage_test.txt', future_frame=True, gpu_mode=True, gpus=1, model='weights/netG_epoch_4_1.pth' , model_type='RBPN', nFrames=7, other_dataset=True, output='output.txt', residual=False, seed=123, testBatchSize=1, threads=1, upscale_factor=4, upscale_only=True) Using GPU mode ==> Loading datasets ==> Building model RBPN [ INFO] ------------- iSeeBetter Network Architecture ------------- [ INFO] ----------------- Generator Architecture ------------------ [ INFO] DataParallel(

... [ INFO] Total number of parameters: 12771943 Pre-trained SR model loaded from: weights/netG_epoch_4_1.pth Traceback (most recent call last): File "iSeeBetterTest.py", line 197, in eval() File "iSeeBetterTest.py", line 107, in eval prediction = model(input, neigbor, flow) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/amperiad/iSeeBetter-master/rbpn.py", line 82, in forward h0 = self.DBPN(feat_input) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/amperiad/iSeeBetter-master/dbpns.py", line 55, in forward x = self.output(torch.cat((h3, h2, h1),1)) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/amperiad/iSeeBetter-master/base_networks.py", line 66, in forward out = self.conv(x) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward return self.conv2d_forward(input, self.weight) File "/home/amperiad/venv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA out of memory. Tried to allocate 1.32 GiB (GPU 0; 11.92 GiB total capacity; 10.46 GiB already allocated; 635.38 MiB free; 280.01 MiB cached)

BLCKEAGLE4 avatar Apr 08 '21 13:04 BLCKEAGLE4

Same here

AwaleSajil avatar Apr 27 '21 08:04 AwaleSajil

i meet the same problem. i change '--gpus' , --gpus = 3, but only one gpu works,same mistakes

sunyclj avatar Nov 29 '21 10:11 sunyclj