deeplearning4j-docs Dl4J memory page: review + improve docs/examples

https://deeplearning4j.org/memory#configuring-memory-limits

Mainly around maxphysicalbytes: should be set to heap + maxbytes

More examples (that specify the exact limits for all types of memory, and behaviour for GPUs) would be good.

Due Date

To be completed by: YYYY-MM-DD

Description

Write a short description of what needs to be done.

Assignees

Please ensure you have assigned at least one person to this issue. Include any authors and reviewers required.

Jul 06 '18 00:07 AlexDBlack

Just kick pastes from Gitter, sincefamily calls: Point is, if you have (as per your example) -Xmx2G maxbytes=8G and maxphysicalbytes=8G, then you ACTUALLY only have 6GB available for off-heap. Because if you use 2G JVM heap, and 6GB off-heap, the "resident set" will be 8GB, and the maxphysical will kick in. And that is on a very good day - BECAUSE the JVM also uses space on the resident set. So you will really only have 3-4 GB available before the maxphysical check kicks in. So - me following your documentation: I want 32GB heap, and 10GB GPU. So I set - following your example, right? - Xmx32G -maxbytes=10G -maxphyiscal=10G. And what happens is that on the VERY FIRST off-heap allocation, the "maxphysical" check kicks in. This is because I've now used 1GB for JVM, 12GB of JVM heap (for the features), and then IMMEDIATELY I got an error that said physicalbytes > maxphysicalbytes.

Jul 06 '18 09:07 stolsvik

So, basically, how this works out for dl4j: You need to set maxphysicalbytes to basically the highest number you want the process to take, including all three of the JVM size itself (allow for 1GB), the JVM heap (-Mmx) and the number of bytes you want to use on the GPU (which is -maxbytes. The reason for including this in this number is because all GPU memory is mirrored on the off-heap memory). The only thing that will happen when you hit the maxphysicalbytes number, is that your process will effectively crash with OutOfMemoryException. So a sane number for this is basically all of your physical memory (CPU memory, on the main board - i.e. not including the GPU memory), maybe minus 1 GB or so for the OS. The rationale for this is that what you want to avoid, is for the OS to start swapping memory to disk, resulting in "memory thrashing" and extremely bad performance - and that maybe it is better to know about this situation by your program crashing, than to run an extremely inefficient machine learning process.

Then set maxbytes to the GPU size you want to use - this is the max amount of memory used off-heap before CPP starts to try to free memory. And again, the reason for this, is that all memory sent to the GPU is also mirrored in off-heap.

The mirroring stuff I am not sure about. But if it really is like this, then it would have helped me very much to have this explained with a spoon.

Also maybe link to https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/Pointer.java where these numbers are processed.

Jul 06 '18 09:07 stolsvik