deeplearning4j-docs
deeplearning4j-docs copied to clipboard
Dl4J memory page: review + improve docs/examples
https://deeplearning4j.org/memory#configuring-memory-limits
Mainly around maxphysicalbytes: should be set to heap + maxbytes
More examples (that specify the exact limits for all types of memory, and behaviour for GPUs) would be good.
Due Date
To be completed by: YYYY-MM-DD
Description
Write a short description of what needs to be done.
Assignees
Please ensure you have assigned at least one person to this issue. Include any authors and reviewers required.
Just kick pastes from Gitter, sincefamily calls: Point is, if you have (as per your example) -Xmx2G maxbytes=8G and maxphysicalbytes=8G, then you ACTUALLY only have 6GB available for off-heap. Because if you use 2G JVM heap, and 6GB off-heap, the "resident set" will be 8GB, and the maxphysical will kick in. And that is on a very good day - BECAUSE the JVM also uses space on the resident set. So you will really only have 3-4 GB available before the maxphysical check kicks in. So - me following your documentation: I want 32GB heap, and 10GB GPU. So I set - following your example, right? - Xmx32G -maxbytes=10G -maxphyiscal=10G. And what happens is that on the VERY FIRST off-heap allocation, the "maxphysical" check kicks in. This is because I've now used 1GB for JVM, 12GB of JVM heap (for the features), and then IMMEDIATELY I got an error that said physicalbytes > maxphysicalbytes.
So, basically, how this works out for dl4j: You need to set maxphysicalbytes to basically the highest number you want the process to take, including all three of the JVM size itself (allow for 1GB), the JVM heap (-Mmx) and the number of bytes you want to use on the GPU (which is -maxbytes. The reason for including this in this number is because all GPU memory is mirrored on the off-heap memory). The only thing that will happen when you hit the maxphysicalbytes number, is that your process will effectively crash with OutOfMemoryException. So a sane number for this is basically all of your physical memory (CPU memory, on the main board - i.e. not including the GPU memory), maybe minus 1 GB or so for the OS. The rationale for this is that what you want to avoid, is for the OS to start swapping memory to disk, resulting in "memory thrashing" and extremely bad performance - and that maybe it is better to know about this situation by your program crashing, than to run an extremely inefficient machine learning process.
Then set maxbytes to the GPU size you want to use - this is the max amount of memory used off-heap before CPP starts to try to free memory. And again, the reason for this, is that all memory sent to the GPU is also mirrored in off-heap.
The mirroring stuff I am not sure about. But if it really is like this, then it would have helped me very much to have this explained with a spoon.
Also maybe link to https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/Pointer.java where these numbers are processed.