jdeb
jdeb copied to clipboard
Avoid excessive heap utilisation due to in memory creation of md5s
We noticed in a application with > 100K files that we ran into problems while generating the checksums. This writes to a file and streams from that file, to the output stream to avoid heap utilisation during that phase.
Thanks for the contribution! I am a little puzzled though - why (even with 100k files) this was a problem. So I assume 100k files, times (random guess) 100 chars per line - that's 10.000.000 chars. That's probably around 20MB of RAM needed. Is that already what you meant by excessive? How much memory usage did you see? I am just wondering if this really was the problem.
Going to set a breakpoint and catch the length, and get a heap dump and tell you exactly what the utilisation is. Certainly in the hundreds of megabytes, due to the length of the paths.
Awesome - thanks!
Hundreds of megabytes? That sounds quite fishy.
Actually - maybe you could print out the file size of the temp file?
Or even better provide the file - be it obfuscate (e.g. with a simple tr) ?
The final md5sums file is 33M. The StringBuilder will retain double that of course, thanks to Java's 2-byte representation of strings:
java.lang.StringBuilder [JNI Global, Stack Local ← checksums, md5s] 75497512
And two more copies again of the same bytes:
checksums.toString()@ControlBuilder:147pContent.getBytes("UTF-8")@ControlBuilder:212
So a little over 220MB I guess. Background is here incidentally (we've got our fair share of heap issues in our Gradle plugin!):
https://github.com/nebula-plugins/gradle-ospackage-plugin/issues/142
Thanks for digging into this. I guess the two copies are where the problems turns into excessive. I am wondering if we could dial back the crazy by getting rid of those copies. On the other hand in-memory will always hurt scalability.
I am not so eager to use temp files - but on the first look the PR looks reasonable. I need to poke around a bit more but I am inclined to accept it. Thanks for your work!
(I so need to get started on jdeb2)