luceneutil
luceneutil copied to clipboard
Confirm new JFR profiling is adding minimal overhead
I have largely trusted that enabling JFR does not harm performance much, as long as you don't configure overly aggressive sampling.
But I've seen internal evidence (Amazon product search, closed source) that maybe JFR might hurt red-line QPS non-trivially, even when using the same .jfc
configuration as we use here in luceneutil
.
Let's try enabling/disabling JFR and confirm the overhead is not too bad (<= 1%?)?
Agreed this is important to do.
The stuff in the profiling.jfc
from the apache repo was geared at testing, where you opt-in with the tests.profile
parameter. But i have a 2-core machine so i wanted to keep overhead low :) Still, it may not be appropriate for benchmarks.
So, for example, i tweaked jdk.ExecutionSample
and jdk.NativeMethodSample
from their defaults of 10/20ms to 1ms. I also spent some time (maybe not enough?) to try to see if more finer-grained such as microseconds was possible :)
If there are perf issues, maybe try to see if these can be relaxed. For the benchmark, I think we could improve the profiling quality in other ways instead?
In particular, if we could separate the profiling output "by-query". Maybe I am confused, if i look at booleanquery profile, i see vectors stuff? Or is that some up-front shenanigans in the benchmark engine unrelated to the test https://github.com/mikemccand/luceneutil/issues/77#issuecomment-758752817 ? Either way, its confusing :)
If the "vectors stuff" is related to VectorDictionary, then yeah it is setup code. We could improve the situation by using a better (on-disk) format like FST for that, then we wouldn't have to load into RAM during setup. If it's not VectorDictionary, then it's a bug though. Also, is there a way to enable sampling only the query threads? Or only analyzing samples from those threads?
We might also be able to only turn on JFR after all the initialization is done so we only see the "long running queries" type of hot spots.