pixel-cnn Memory cost too much and do not start to train when using tensorflow1.4

Memory cost too much and do not start to train when using tensorflow1.4

Open sunume opened this issue 8 years ago • 4 comments

when training to cifar,it consist to increasing costed memory and do not begin to train. I have a 2*16G RAM.It's seem to be enough.

1*GTX1080 cudnn6 tensorflow1.4 python3.5

Oct 31 '17 06:10 sunume

I met the same problem.

Dec 07 '17 22:12 taufikxu

See tensorflow/tensorflow#12598

Dec 10 '17 15:12 pesser

As @pesser pointed out the problem is caused by the broken data-dependent initialization mechanism.

I've implemented an alternative and more intuitive way of making data-dependent initialization a while ago. I've also just tried to merge my mechanism with the current pxpp++ code, please see https://github.com/kolesman/pixel-cnn.

Haven't checked the code extensively, but it seems to work. Let me know whether it also works for you, then I will create a pull request.

Dec 11 '17 02:12 kolesman

Just how memory intensive is pixelCNN++?

I've been fine training on smaller models but now that I've hit my wall I want to know exactly where and how the memory is being allocated.

I am currently training on images with size = 512x512, batch size = 5 and num_filters = 32.

I received a number of different errors:

OP_REQUIRES failed at cwise_ops_common.h:120 :Resource exhausted: OOM when allocating tensor with shape[5,64,256,256]

OP_REQUIRES failed at random_op.cc:77 : Resource exhausted: OOM when allocating tensor with shape[5,512,512,64]

etc...

I don't fully understand the shapes of these tensors. I see batch size there and when I play with the number of filters the 64 starts to change as well.

Any help would be much appreciated!

Dec 15 '21 08:12 SammyGelman

pixel-cnn pixel-cnn copied to clipboard

Memory cost too much and do not start to train when using tensorflow1.4

pixel-cnn
pixel-cnn copied to clipboard