stochastic_depth_keras
stochastic_depth_keras copied to clipboard
Dealing with memory limitation
Hi, Following your advice regarding setting the recursion limit. I managed to get this to run with N = 17. My windows machine has a 16GB of RAM, using the Theano backend. higher than that, python crashes.
Here are my questions:
- Is this a result of the GPU or CPU ram limit?
- Is there a way to effectively deal with it, besides getting more ram?
- How much RAM do you have on machines that allow reaching N = 50.
- Does it matter whether you use tensorflow or the theano backend?
Thank you!
- I don't know whether it's because of ram limit. Could you show the error message?
- You should try shallower model at first.
- My machine has 8GB of RAM and 4GB of VRAM(GTX970).
- This code doesn't work with tensorflow because of incompatibility about how it accesses subtensor.
Recently new features of keras(keras-1) have been merged into master, and there has been lots of modifications. I'll run this code with latest version of keras with theano/tensorflow and check if this still works.
Thank you for your reporting.
Well, i've tried it on 2 different machines and I am getting what I believe to be the same error. The weaker one has 16gig of ram and 960M (laptop) The stronger one has 16gig of ram and 980ti. From what I understand from you, you are able to use N = 50 with setrecursionlimit(2 ** 20) On a 8 gig machine, so this is probably not really a ram limitation problem.
quite unfortunately its hard for me to pinpoint the exact error: I only get the unhandled win32 exception in python.exe I can run the debugger and pinpoint the exact line of code if you want.
This looks like a windows bug to me.
@RaananHadar I can compile the computation graph on 32GB RAM machines. The compile time for N=50 is just intolerable, 4 hours. I've seen it go above 50% RAM utilization, so it's not surprising that it crashes for you. (We're talking about Theano, of course, Tensorflow will probably do it in a few minutes. But for that, you have to rewrite parts of the code. And unless you rewrite it very thoroughly, the training's gonna be very slow, with all the gate variable settings.)
But even if you manage to compile the computation graph, you'll have problems with the actual training. We have a few 4GB GTX 980s and two 12GB Titan Xs. The GTX 980 dies with out-of-GPU-memory for N=50. I have not yet tried N=50 on my 12GB GTX Titan X, because that would probably kill all the N=18 experiments we are normally running. (N=18 runs just fine on the GTX 980.)
@danielvarga Thank you for sharing your experience. So you are saying that the most elegant solution is to ultimately port the code to tensorflow?
@RaananHadar With current TH and TF it's a trade-off, really. With TF, you save lots and lots of compile time, and can fit into your 16GB core RAM. But I don't think you can make the TF version as fast as the TH version, even if you optimize the gate variable assignments to death. In a few months, maybe. Disclaimer: I have way more experience with TH than with TF, a TF expert might disagree.