javacpp-presets
javacpp-presets copied to clipboard
about Memory growth test of Tritonserver
We try to test the memory growth by gather the stats of memory usage when doing inference. each time when we do an inference, we will get the statistics of memory it allocated. we found that: ("The max allocation of Memory when doing a single inference" - "The average allocation of Memory when doing a single inference") / ("The max allocation of Memory when doing a single inference") = 0.46, which means the variation is too big, why? it varies from about 700MB to 1500MB.
attached is the simple.java file and the test.sh script, probably to reproduce this, one need to modify the dir in the test.sh accordingly.
@saudet
There's a couple of things that could be happening, but the first thing you should check is for dangling Pointer objects. Try to run a command like this:
mvn clean compile exec:java -Dorg.bytedeco.javacpp.logger.debug -DargLine=-Xmx1000m 2>&1 | grep Collecting | grep -v 'ownerAddress=0x0'
If you see any output from that, you should find where those Pointer objects are not getting deallocated and call close() on them, or you could use PointerScope
where appropriate: http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/
looks like there's quite a few there. Since there's lots of "new BytePointer" in Simple.java, which ones need me to do a close() on it? client_10.log
Just try to use PointerScope...
it will help to release/close by self?
Kind of, it's like a scope in C++, see the example here: http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/
Kind of, it's like a scope in C++, see the example here: http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/
I did some test today with "PointerScope" as attached. But memory usage is still dangled, vibrated: ("The max allocation of Memory when doing a single inference" - "The average allocation of Memory when doing a single inference") / ("The max allocation of Memory when doing a single inference") = 0.52, which means the variation is still too big. it varies from about 50MB to 500MB. Do we have more ways to debug? 20220129_PointerScope.zip
Please check the debug log like I asked you to do above https://github.com/bytedeco/javacpp-presets/issues/1141#issuecomment-1023895781
Please check the debug log like I asked you to do above #1141 (comment)
looks you mean even if "PointerScope" added, there still will be more leaks of pointers, if "PointerScope" does not include all the pointers?
I added "try (PointerScope scope = new PointerScope()) {" just at the beginning of the Main function, why there's still lots of "Debug: Collecting org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0x0,deallocatorAddress=0x0]" in the output log file?
Those are fine, their ownerAddress is 0. If you see any that have an address other than 0, then you should find what those are. If all that you see do not have an address, then you're probably dealing with GC issues of the Java heap. Try a different one: https://developers.redhat.com/articles/2021/11/02/how-choose-best-java-garbage-collector
BTW, how did you make sure this is happening only with Java, and not with C++? Maybe it's a problem with Triton...
BTW, how did you make sure this is happening only with Java, and not with C++? Maybe it's a problem with Triton...
good point.
searched the log file: Debug: Releasing org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0x7f29bf518190,deallocatorAddress=0x7f29c7ec4090] Debug: Collecting org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0x0,deallocatorAddress=0x0] all the log of Collecting are ownerAddress=0x0
Samuel: We designed our test case like this:
-
we will start a thread to monitor the memory usage: a, this thread will monitor the usage of memory every two seconds b, each time, we will use this to do statistics: DoubleSummaryStatistics stats = new DoubleSummaryStatistics(); double memory = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(); stats.accept(memory); c, each time, we will use this to calculate the delta: double memory_allocation_delta = stats.getMax() - stats.getAverage(); double memory_allocation_delta_mb = memory_allocation_delta / 1E6; double memory_allocation_delta_percent = memory_allocation_delta / stats.getMax(); here, if the "memory_allocation_delta_percent" is larger than 10 percent, then the test will fail.
-
for the main process, we will do this: for(int i = 0; i < 1000000; i++){ RunInference(server, model_name, is_int, is_torch_model); }
We assume, every two seconds, there will be some member of function "RunInference" be processed, some memory will be allocated, some memory will be freed during each call of "RunInference", so the variation of "memory_allocation_delta_percent" should not be larger than 10%, how do you think? This is the right way to test memory growth of Java process?
@saudet
Well, that's a question about Triton more than anything else, I think. All buffers should be preallocated as much as possible, so variations like that don't occur.
Well, that's a question about Triton more than anything else, I think. All buffers should be preallocated as much as possible, so variations like that don't occur.
since, each time when to do the reference, in function of "RunInference", we will allocator lots of buffers, do you mean this needs to be replaced by some static/preallocated memory? If we re-allocate these buffers each time when doing inference, this variation is common to Java process?
That has nothing to do with Java! You're allocating these buffers for Triton, not Java. This is something that needs to be fixed for Triton.
buffers for Triton, not Java. This is something that needs to be fixed for Triton.
Yes, we allocate these buffers for Triton to do inference or compare some result. So, let's say, if I allocate these buffers as some static/preallocated ones, then the variation issue is fixed, does that mean the GC is not working well enough?
Preallocating and reusing objects that use memory on the Java heap helps the GC, but it's possible to tune the GC to be able to cope better with larger amounts of garbage too, yes.
Preallocating and reusing objects that use memory on the Java heap helps the GC, but it's possible to tune the GC to be able to cope better with larger amounts of garbage too, yes.
so you mean the ways listed https://developers.redhat.com/articles/2021/11/02/how-choose-best-java-garbage-collector#parallel_collector here to tune the GC for larger amounts of garbage?
That kind thing, yes, but if the requests that you get don't require allocating different kinds of buffers all the time, it's more efficient to just reuse those buffers. That's probably what your users are asking about.
here's the default JVM parameters:
root@4a42d065cf6e:/workspace/javacpp_presets_upstream/javacpp-presets/tritonserver# java -XX:+PrintCommandLineFlags -version -XX:G1ConcRefinementThreads=10 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=524877248 -XX:MaxHeapSize=8398035968 -XX:+PrintCommandLineFlags -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC openjdk version "11.0.13" 2021-10-19 OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04) OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
Which parameters you think that probably need to tune?
That kind thing, yes, but if the requests that you get don't require allocating different kinds of buffers all the time, it's more efficient to just reuse those buffers. That's probably what your users are asking about.
so, for this, since I want to make the largest allocated memory as static ones, how can I know which buffers/object is the largest one?
Not just the largest one, all of them, if possible. I'm guessing that ideally your users want this to be "garbage free" to get the lowest latency possible, for real time applications, but I'm just guessing. You should try to find out what the needs of your users are, and then we can figure out how to meet those needs.
Not just the largest one, all of them, if possible. I'm guessing that ideally your users want this to be "garbage free" to get the lowest latency possible, for real time applications, but I'm just guessing. You should try to find out what the needs of your users are, and then we can figure out how to meet those needs.
temporary, this test is just internally, probably users will have such sort of requirements? I'm not sure what JAVA users will most care about
temporary, this test is just internally, probably users will have such sort of requirements? I'm not sure what JAVA users will most care about
Well, if what you care most about is money, HFT is where it's at for low-latency Java applications: https://www.efinancialcareers.com/news/2020/11/low-latency-java-trading-systems https://medium.com/@jadsarmo/why-we-chose-java-for-our-high-frequency-trading-application-600f7c04da94 https://www.azul.com/use-cases/trading-risk/ https://github.com/OpenHFT
But personally I prefer working on embedded systems such as the ones from aicas: https://www.aicas.com/wp/use-cases/ @jjh-aicas Do you have use cases where machine learning and GPUs could be of help?
Samuel:
Today I did more tests on GC and Heap:
- Command line arg is: -DargLine=-Xmx1000m
- While the test is running, Memory allocated (heap) will grow gradually from about 60M to 4000M (Here: Memory allocated is calculated by: Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(). Detail is attached as client.log
- info of gc is collected by cmd: jstat -gc 10524 500. Looks like "OU" grows fast! Detail is attached as gc.log
Why "OU" grows fast here? @saudet client.log gc.log
That's the "old space" apparently: https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html Here's some doc about that: https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/garbage_collect.html So it just looks like there are buffers that can't be freed because they are still referenced somewhere.
That's the "old space" apparently: https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html Here's some doc about that: https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/garbage_collect.html So it just looks like there are buffers that can't be freed because they are still referenced somewhere.
Then how can I quickly locate which APIs/calls still refer these buffers?
Flight Recorder can usually help with that: https://docs.oracle.com/javase/9/troubleshoot/troubleshoot-memory-leaks.htm#JSTGD271 https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u