renaissance icon indicating copy to clipboard operation
renaissance copied to clipboard

Add benchmarks for the explicit-state model checker TLC

Open lemmy opened this issue 4 years ago • 15 comments

The model checker (https://github.com/tlaplus/tlaplus) is optimized for throughput and has a single hot path. I can come up with benchmarks for a) scalability and b) throughput. One is a spec where evaluation of the next-state relation is cheap, the other has a costly next-state relation. Adjusting specs to the available hardware is just a matter or choosing a few parameters. Benchmarks can be distributed as two separate, self-contained jars.

lemmy avatar Nov 07 '19 17:11 lemmy

Thanks @lemmy for the suggestion ! A TLA+ benchmark would definitely fit very well in the spirit of the Renaissance suite. Would those benchmarks need to be written or do they already belong to https://github.com/tlaplus/Examples ? I also see that everything is MIT licensed, so that's perfect.

One constraint we have for a new benchmark is to always deterministically compute the same thing. So we should avoid random aspects and side effects. So basically, two subsequent calls to run the workload should perform the exact same thing such that they stay comparable. Another constraint is for the workload to take a few seconds to run on a standard machine. We're not too strict about this, but it would be good if it's in the same ballpark than the other benchmarks (so [0.5, 5] sec per iteration would be fine). But I guess we can always tweak some parameters to reach that goal.

If you are willing to open a PR, adding a new benchmark (or set of benchmarks) to renaissance is as simple as creating a subdirectory in the benchmarks directory, describe your maven dependencies in the sbt config and write the actual workload in a runIteration method of a class that extends Benchmark. This is detailed in the contribution guide. Alternatively, you could point us to a demo workload and we can do the integration ourselves.

farquet avatar Nov 08 '19 09:11 farquet

Thanks for the suggestion - this looks like an interesting workload. To add to what Francois said, there is a benchmark subfolder benchmarks/dummy that is a minimal example of how to add a new benchmark group - if you would like to go ahead and add a TLC benchmark, that's a good template to start from.

axel22 avatar Nov 08 '19 17:11 axel22

Is there an example that shows how to add a benchmark that comes as a self-contained jar file?

lemmy avatar Nov 15 '19 00:11 lemmy

All jar dependencies we have come from maven central and they are defined in the build.sbt of each project. Here is an example.

Wouldn't it be possible to have the required dependencies there and the thin benchmark layer as sources here ?

farquet avatar Nov 15 '19 01:11 farquet

That could be done but the jar would be tailored to the benchmark. Not sure we want to "pollute" maven central with such a jar. By the way, the jar can be pack200ed to a few hundred kbs.

lemmy avatar Nov 15 '19 01:11 lemmy

I see. And how about the second option ? Having the benchmark definition here has advantages like easier source code exploration, being able to adapt the workload size, etc. Could that be possible ?

If it's a matter of the work that has to be done to turn the sample application into a benchmark, we could do that part.

farquet avatar Nov 15 '19 01:11 farquet

It is probably best to ditch the idea of a self-contained jar and instead use the vanilla jar from maven central and check the specs (input) into this repository. Is the benchmark layer going to fork a VM?

lemmy avatar Nov 15 '19 01:11 lemmy

Is the benchmark layer going to fork a VM?

By default, the harness does not fork a new VM. The optional JMH-compliant build, however, uses JMH as the harness, which does fork a new VM.

axel22 avatar Nov 15 '19 01:11 axel22

It is probably best to ditch the idea of a self-contained jar and instead use the vanilla jar from maven central and check the specs (input) into this repository.

That would be great indeed. It matches how the other benchmarks are built. For instance, we have several benchmarks that include any kind of input in a resources folder like here.

Alternatively, if the input is small it can also be inlined into the benchmark definition.

farquet avatar Nov 15 '19 01:11 farquet

Two questions:

  • How do you make sbt consume maven snapshot dependencies?
  • What the meaning of optional JMH-compliant build? TLC fails if we check two different specs with a single VM instance (VM has to be fresh).

lemmy avatar Nov 15 '19 17:11 lemmy

How do you make sbt consume maven snapshot dependencies?

You can use the -SNAPSHOT suffix just like in a pom file. So you can add something like this to your build.sbt :

libraryDependencies += "org.lib" %% "lib" % "1.0-SNAPSHOT" % "test" changing()

changing() forces the re-download in case you push an updated version.

It would be good to keep that for dev only though to avoid potentially changing dependencies in the future.

What the meaning of optional JMH-compliant build?

We have users who prefer using JMH buils for convenience because they have tools to run such jars. However, the renaissance harness is the default way of running benchmarks because its goal is to augment it with tools to analyze the workloads automatically.

TLC fails if we check two different specs with a single VM instance (VM has to be fresh).

This is not a problem if your two workloads are distinct benchmarks because renaissance benchmarks are meant to be executed individually anyway. Your benchmarks can be part of the same sbt project though. Create a subdirectory named something like model-checker in the benchmarks directory (it will be discovered automatically) and create two individual classes that extend Benchmark, one for each model checker.

farquet avatar Nov 15 '19 22:11 farquet

TLC fails if we check two different specs with a single VM instance (VM has to be fresh).

In addition to what Francois said: you should be aware that the Renaissance harness will call the run method of the benchmark (i.e. the entry point into your workload) multiple times. The reason for this is that the harness first tries a couple of warmup runs, and then does several runs with measurements. This means that if you should make sure that there is no global state on which the run depends on, or if there is such a global state, that you reset that state in the setup and the teardown methods.

axel22 avatar Nov 15 '19 23:11 axel22

@axel22 If that's a strict requirement, I don't see a way to move forward. I don't have the bandwidth to change TLC to support warmup runs (I once hacked something based on OSGi). Initially, I assumed renaissance would accept a macro-benchmark that runs for a few minutes and thus needs no warmups. TLC fits this type of benchmark very well.

lemmy avatar Nov 16 '19 00:11 lemmy

@lemmy It's still possible to have single runs, but it would require setting the default number of repetitions to 1 in the benchmark definition. In this case, there would be no warmup. I think it is still useful to contribute such a benchmark - we could keep in the staging area, as that would allow others to run it and analyze it, and maybe even invest into adding multiple-runs-per-process ability to TLC later.

axel22 avatar Nov 16 '19 01:11 axel22

https://github.com/tlaplus/tlaplus/tree/master/general/performance/Grid5k would be the spec that easily simulates two workloads: scalability and computation.

lemmy avatar Nov 25 '19 23:11 lemmy