pixel-cnn icon indicating copy to clipboard operation
pixel-cnn copied to clipboard

Memory cost too much and do not start to train when using tensorflow1.4

Open sunume opened this issue 8 years ago • 4 comments

when training to cifar,it consist to increasing costed memory and do not begin to train. I have a 2*16G RAM.It's seem to be enough.

1*GTX1080 cudnn6 tensorflow1.4 python3.5

sunume avatar Oct 31 '17 06:10 sunume

I met the same problem.

taufikxu avatar Dec 07 '17 22:12 taufikxu

See tensorflow/tensorflow#12598

pesser avatar Dec 10 '17 15:12 pesser

As @pesser pointed out the problem is caused by the broken data-dependent initialization mechanism.

I've implemented an alternative and more intuitive way of making data-dependent initialization a while ago. I've also just tried to merge my mechanism with the current pxpp++ code, please see https://github.com/kolesman/pixel-cnn.

Haven't checked the code extensively, but it seems to work. Let me know whether it also works for you, then I will create a pull request.

kolesman avatar Dec 11 '17 02:12 kolesman

Just how memory intensive is pixelCNN++?

I've been fine training on smaller models but now that I've hit my wall I want to know exactly where and how the memory is being allocated.

I am currently training on images with size = 512x512, batch size = 5 and num_filters = 32.

I received a number of different errors:

OP_REQUIRES failed at cwise_ops_common.h:120 :Resource exhausted: OOM when allocating tensor with shape[5,64,256,256]

OP_REQUIRES failed at random_op.cc:77 : Resource exhausted: OOM when allocating tensor with shape[5,512,512,64]

etc...

I don't fully understand the shapes of these tensors. I see batch size there and when I play with the number of filters the 64 starts to change as well.

Any help would be much appreciated!

SammyGelman avatar Dec 15 '21 08:12 SammyGelman