pix2pixHD icon indicating copy to clipboard operation
pix2pixHD copied to clipboard

RuntimeError: CUDA error: out of memory (Is there any problem with pytorch)

Open panduranga007 opened this issue 6 years ago • 7 comments

(TENSOR) C:\Users\Ravi\pix2pixHD>python train.py --name label2city_60p --batchSize 1 --label_nc 0 ------------ Options ------------- batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: ./datasets/cityscapes/ debug: False display_freq: 100 display_winsize: 512 feat_num: 3 fineSize: 512 gpu_ids: [0] input_nc: 3 instance_feat: False isTrain: True label_feat: False label_nc: 0 lambda_feat: 10.0 loadSize: 1024 load_features: False load_pretrain: lr: 0.0002 max_dataset_size: inf model: pix2pixHD nThreads: 2 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: label2city_60p ndf: 64 nef: 16 netG: global ngf: 64 niter: 100 niter_decay: 100 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_instance: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: scale_width save_epoch_freq: 10 save_latest_freq: 1000 serial_batches: False tf_log: False use_dropout: False verbose: False which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [AlignedDataset] was created #training images = 4000 GlobalGenerator( (model): Sequential( (0): ReflectionPad2d((3, 3, 3, 3)) (1): Conv2d(4, 64, kernel_size=(7, 7), stride=(1, 1)) (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (6): ReLU(inplace) (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (9): ReLU(inplace) (10): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (11): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (12): ReLU(inplace) (13): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (14): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (15): ReLU(inplace) (16): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (17): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (18): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (19): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (20): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (21): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (22): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (23): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (24): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (25): ConvTranspose2d(1024, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (26): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (27): ReLU(inplace) (28): ConvTranspose2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (29): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (30): ReLU(inplace) (31): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (32): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (33): ReLU(inplace) (34): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (35): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (36): ReLU(inplace) (37): ReflectionPad2d((3, 3, 3, 3)) (38): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1)) (39): Tanh() ) ) Traceback (most recent call last): File "train.py", line 38, in model = create_model(opt) File "C:\Users\Ravi\pix2pixHD\models\models.py", line 15, in create_model model.initialize(opt) File "C:\Users\Ravi\pix2pixHD\models\pix2pixHD_model.py", line 39, in initialize opt.n_blocks_local, opt.norm, gpu_ids=self.gpu_ids) File "C:\Users\Ravi\pix2pixHD\models\networks.py", line 44, in define_G netG.cuda(gpu_ids[0]) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 258, in cuda return self._apply(lambda t: t.cuda(device)) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 185, in _apply module._apply(fn) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 185, in _apply module._apply(fn) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 185, in _apply module._apply(fn) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 191, in _apply param.data = fn(param.data) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 258, in return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA error: out of memory

panduranga007 avatar Sep 26 '18 10:09 panduranga007

got same error

ZubairKhan001 avatar Oct 03 '18 16:10 ZubairKhan001

How much memory do you have in your GPU? I think 12 should be OK.

kleinyoni avatar Oct 03 '18 21:10 kleinyoni

@kleinyoni Should we have 12Gb on a single card or I can have few video cards which gives 12Gb in total or more ?

AlexanderKozhevin avatar Dec 25 '18 14:12 AlexanderKozhevin

@AlexanderKozhevin alI think a single card. Since the primary data loading is done to one GPU(i think. not sure).

Try to use --ngf 32 instead.

It should use less memory.

Also you can try to run it on the cloud and see if additional memory help with the run.

kleinyoni avatar Dec 25 '18 15:12 kleinyoni

The --resize_or_crop option has a greate effect on memory usage for me. Passing --resize_or_crop scale_width worked for the default testing images in the repo with a 11GiB GTX 1080Ti: python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop scale_width

vobject avatar Jan 14 '19 16:01 vobject

The --resize_or_crop option has a greate effect on memory usage for me. Passing --resize_or_crop scale_width worked for the default testing images in the repo with a 11GiB GTX 1080Ti: python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop scale_width

thanks for you. it does work.

sxudai avatar Apr 11 '19 09:04 sxudai

The --resize_or_crop option has a greate effect on memory usage for me. Passing --resize_or_crop scale_width worked for the default testing images in the repo with a 11GiB GTX 1080Ti: python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop scale_width

really worked , thank you

8067222151 avatar Sep 12 '22 07:09 8067222151