guava
guava copied to clipboard
Document on the Guava wiki how to run one of Guava's benchmarks for yourself
Original issue created by [email protected] on 2014-01-22 at 09:00 PM
One reason we check in our benchmarks is so that people can verify that they get equivalent results in their own environments. It would be nice if they had a "simple" how-to for running one.
Original comment posted by [email protected] on 2014-01-22 at 09:17 PM
Step one: release new Caliper beta so they can actually compile.
any update on this? I'm able to compile guava-tests but still no clue how to run benchmarks..
It seems I am also able to compile guava-tests but nowhere I find any info on how to run the benchmarks. Is there any interest in giving support for external benchmark execution?
Is there documentation around how to run benchmarks?
?????? How do we run the benchmarks google ???????
I am not sure if we've ever run them in the open-source world :(
It looks like the benchmarks are part of the tests.jar
that is generated if you build all of guava-tests
.
Then you'd need Caliper. The latest release is fairly old and doesn't come with the uber-jar that we apparently used to provide, so you'd probably need to build it yourself and then assemble a classpath with all the transitive deps of Guava and Caliper both :\
Then hopefully you "just" run com.google.caliper.runner.CaliperMain
with an argument like com.google.common.base.AsciiBenchmark
and optionally some other args.
It's probably possible, but even if it is, we haven't made it easy.
@cpovirk Thank you and apologies for the ... enthusiastic comment. I've factored it into my PR work. JMH's annotations are very close to Caliper's anyway. I'll try the direct run route above and break it out into a separate PR as it's probably related but not a high priority to the Java 21 stuff.
Mostly I'm just trying to see if there is any change in library performance by compiling at a different bytecode level. I can accomplish that with an amended JMH suite and then clean up and keep the work in case that's usable.
If there's no objection, I may move it to a different module as part of that work and add a README which offers instructions to run the harness. I'm sure you've seen the usual threads about this and so merging those are entirely at Google's discretion, offered optionally, etc etc.
I have definitely been curious about the performance impact of nestmates (whose performance impact was definitely measurable in some Google code) and probably other stuff. (Granted, I don't know if we have benchmarks for any code that would be affected by any particular bytecode feature.)
A big reason that many of our benchmarks still use Caliper is that that enables us to also run them on Android. That would probably keep us from merging any in-place replacements. I could imagine accepting a parallel set of JHM benchmarks, especially if it turns out that the results differ from Caliper in interesting ways :) (I could probably take on running both sets of benchmarks once the JMH ones are available—since, as established, running the Caliper benchmarks externally is nontrivial.)
@cpovirk I've actually rolled back the JMH migration, having finally found out how to run the Caliper benchmarks. I am going to write a Wiki page guide as a contribution, assuming of course your permission. I don't think Caliper is reporting results anywhere yet, but if it is, I'll see if I can't get it reporting into CI.
It would be very cool to see continuous benchmarks on Guava. I'm sure Google has it internally but externally it's sort of a question mark. Personally, I'm most curious about what bytecode targeting does to those numbers.
A wiki page would be great, thanks.
Caliper used to be able to report results to an app, but we turned that down a while back. I'm sure it would be possible to dump output files into GitHub or something, but I probably wouldn't bother, especially since I don't know how consistent the results would be for VMs on ever-changing hardware.
(Internally, I want to say that there is some infrastructure for continuous benchmarks but that we've never wired up our benchmarks? I'm not actually sure. We more often look at fleet-wide profiling data, at least until it comes time to actually optimize some specific implementation.)