owi icon indicating copy to clipboard operation
owi copied to clipboard

benchmark `use_ite_for_select`

Open redianthus opened this issue 8 months ago • 7 comments

in interpret.ml, it is set to true for now, if both prove useful, a CLI flag could be offered

redianthus avatar Apr 15 '25 20:04 redianthus

So I ran some benchmarks (z3, 30 second timeout with d6fabb2) and here are the results. Current version:

| Nothing | Reached | Timeout | Other | Killed | Total |
|:-------:|:-------:|:-------:|:-----:|:------:|:-----:|
|      1 |    672 |    501 |     37 |      0 |   1211 |

Time stats (in seconds):

| Total | Mean | Median | Min | Max |
|:-----:|:----:|:------:|:---:|:---:|
| 16808.30 | 13.88 | 4.01 | 0.23 | 30.00 |

With use_ite_for_select set to false:

| Nothing | Reached | Timeout | Other | Killed | Total |
|:-------:|:-------:|:-------:|:-----:|:------:|:-----:|
|      1 |    671 |    503 |     36 |      0 |   1211 |

Time stats (in seconds):

| Total | Mean | Median | Min | Max |
|:-----:|:----:|:------:|:---:|:---:|
| 16755.74 | 13.84 | 3.87 | 0.23 | 30.00 |

So the second one is marginally faster, but there are some differences in solved goals, for example:

  • testcomp/sv-benchmarks/c/eca-rers2012/Problem18_label00.c solved in 13s & timeout.
  • testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.adding.2.prop1-back-serstep.c solved in 10s & timeout.
  • testcomp/sv-benchmarks/c/seq-mthreaded/pals_lcr.5_overflow.ufo.UNBOUNDED.pals.c solved in 20s & timeout
  • testcomp/sv-benchmarks/c/loops/insertion_sort-1-2.c solved in 7s & timeout.
  • testcomp/sv-benchmarks/c/loops/invert_string-1.c timeout & solved in 0.5s.
  • testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.adding.6.prop1-back-serstep.c timeout & solved in 16s.

There are also tests in which the result is other by one and solved by the other:

  • testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.arbitrated_top_n3_w16_d128_e0.c timeout & other.
  • testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.arbitrated_top_n3_w8_d128_e0.c timeout & other.
  • testcomp/sv-benchmarks/c/nla-digbench-scaling/ps4-ll_unwindbound100.c other & solved in 1s
  • testcomp/sv-benchmarks/c/nla-digbench-scaling/ps5-ll_unwindbound100.c other & solved in 1s
  • testcomp/sv-benchmarks/c/product-lines/email_spec11_productSimulator.cil.c other & solved in 3.5s

There is a slight complementarity between the two, I don't know if its worth adding a flag for it, but it probably needs a more through investigation (and understanding of why there are these changes). But I can add the flag, what do you think @redianthus? Note: I can't reproduce locally (with 1 worker) the results I got from symbocalype, not sure if the differences come from parallelism or some particular settings/command line options that change the behavior of Owi.

hra687261 avatar Oct 29 '25 21:10 hra687261

Yes, I think adding the flag is worth it.

To get a better idea of the results, could you run this tool on the two results file you got? In particular, I am interested in seeing the execution time variation only on goals solved by both tools. For now I'd say we should set the option to true by default, but I may change my mind after seeing the diff results.

I think there are two interesting questions (to which I don't have an answer :-)):

  1. Could we write some heuristics to decide if use_ite_for_select is interesting, either on a given program or on a given select instruction?
  2. Could we use an ite in other places? (This one is probably easier to answer).

redianthus avatar Oct 31 '25 10:10 redianthus

Ok! Here's the output of the diff tool:

tool1 had 672 reached and tool2 had 671 reached
tool1 had 501 timeout and tool2 had 503 timeout
tool1 had 001 nothing and tool2 had 001 nothing
tool1 had 037 other   and tool2 had 036 other
tool1 had 000 killed  and tool2 had 000 killed

tools have 665 reached tasks in common

tool1 had 007 tasks tool2 did not found
tool2 had 006 tasks tool1 did not found

on      commonly  reached tasks, tool 1 took 1595.700267 sec. (mean 2.399549, median 0.548621, min 0.306081, max 27.683100) and tool 2 took 1575.438922 sec. (mean 2.369081, median 0.542383, min 0.321276, max 29.703400)
on *not commonly* reached tasks, tool 1 took 131.851030 sec. (mean 18.835861, median 20.340000, min 6.878930, max 28.258600) and tool 2 took 40.464773 sec. (mean 6.744129, median 9.951390, min 0.592530, max 17.833700)

among tasks reached only by tool 1, tool 2 replied 000 nothing, 007 timeout, 00 other and 00 killed
among tasks reached only by tool 2, tool 1 replied 000 nothing, 003 timeout, 03 other and 00 killed
tasks solved only by tool 1:
  testcomp/sv-benchmarks/c/xcsp/CostasArray-12.c
  testcomp/sv-benchmarks/c/xcsp/AllInterval-016.c
  testcomp/sv-benchmarks/c/seq-mthreaded/pals_lcr.5_overflow.ufo.UNBOUNDED.pals.c
  testcomp/sv-benchmarks/c/loops/insertion_sort-1-2.c
  testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.bakery.6.prop1-back-serstep.c
  testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.adding.2.prop1-back-serstep.c
  testcomp/sv-benchmarks/c/eca-rers2012/Problem18_label00.c
tasks solved only by tool 2:
  testcomp/sv-benchmarks/c/xcsp/AllInterval-013.c
  testcomp/sv-benchmarks/c/product-lines/email_spec11_productSimulator.cil.c
  testcomp/sv-benchmarks/c/nla-digbench-scaling/ps5-ll_unwindbound100.c
  testcomp/sv-benchmarks/c/nla-digbench-scaling/ps4-ll_unwindbound100.c
  testcomp/sv-benchmarks/c/loops/invert_string-1.c
  testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.adding.6.prop1-back-serstep.c

hra687261 avatar Oct 31 '25 15:10 hra687261

Which one of tool1/tool2 was with ite/without ite? You can open a PR with the flag (it can just set a global variable, no need to propagate the value through all functions...). Depending on the result, we'll try to choose the best default value.

I added two other issues for the heuristics/other places and we can have a look at them later.

redianthus avatar Nov 03 '25 14:11 redianthus

Which one of tool1/tool2 was with ite/without ite?

Tool1 one is the default behavior (with ite) and tool2 is without.

You can open a PR with the flag (it can just set a global variable, no need to propagate the value through all functions...). Depending on the result, we'll try to choose the best default value.

https://github.com/OCamlPro/owi/pull/812

hra687261 avatar Nov 03 '25 14:11 hra687261

OK, so, I'd say without ite is better then.

redianthus avatar Nov 03 '25 14:11 redianthus

Since the time difference between the two is not very large (altough tool2 two seems slightly faster) and since tool1 reaches 672 goals while tool2 reaches 671 ones, it does not seem clear-cut to me, but you know better :)

hra687261 avatar Nov 03 '25 14:11 hra687261