sv-benchmarks Reachability benchmarks with overflow

I recently found many benchmarks from the reachability category which contain overflow and thus undefined behaviours. These include nla-digbench/hard.c as mentioned in #1105 and the ones I mentioned in #1159. Below are some more.

nla-digbench/egcd.c
nla-digbench/egcd2.c
nla-digbench/egcd3.c
nla-digbench/lcm1.c
nla-digbench/prod4br.c

The problem also propagates to nla-digbench-scaling but I think the best would be to figure it out what to do with the base case and then using the script from @MartinSpiessl to generate the derived versions.

The question is how to fix these benchmarks from the reachability perspective (we should additionally keep them as they are for the no-overflow category).

I spent some time trying to find out a general solution, but nothing came to my mind. In issue #1105, it is said that using unsigned would solve the problem, but I don't think this apply to the general case (not event to this particular case, see this): the extra bit might not be enough if e.g. we multiply two non deterministic integers.

I thought adding something like assume_abort_if_not(X >= 1) <= INT_MAX and assume_abort_if_not(X >= 1) >= INT_MIN would do the trick even if we would need the check for every sub-expression, e.g. assert(x*y + a*b == 0) would require to add

assume_abort_if_not(x*y >= 1) <= INT_MAX
assume_abort_if_not(x*y >= 1) >= INT_MIN
assume_abort_if_not(a*b >= 1) <= INT_MAX
assume_abort_if_not(a*b >= 1) >= INT_MIN
assume_abort_if_not(x*y + a*b >= 1) <= INT_MAX
assume_abort_if_not(x*y + a*b >= 1) >= INT_MIN

however I'm not sure this works. I'm performing the verification on optimised code (using -O3 in clang) and I still get a wrong result. While this might be a bug on my verifier, I suspect it is actually that clang performs some aggressive optimisation due to undefined behaviour (this was the way I discover all other overflows).

Oct 09 '20 10:10 hernanponcedeleon

I already understood why the "solution" using assume_abort_if_not does not work: the check would already overflow.

@divyeshunadkat proposed in #1155 to use long long as the type or certain variables. However I still don't think the proposed solution would work in general case. It will work when we use only use + or -. Even if we only use on * (the extra 4 bytes should accommodate the result), however as soon as we have something like s*y*y (e.g. egcd2.c), we have problems again.

I don't think there are worse cases than those (using twice *) so one possibility would be to use long long (8 bytes) for the result variables and restrict the nondeterministic values to shorts (2 bytes). I think this we keep the complexity of the benchmarks without resulting in overflow.

Oct 11 '20 10:10 hernanponcedeleon

Now that #1155 has been merged, we can proceed to fix the inherited bugs in nla-digbench-scaling. @MartinSpiessl can you please re-run your script in the new fixed benchmarks from nla-digbench? You can assign me as a reviewer for the PR.

Oct 26 '20 08:10 hernanponcedeleon

I think there are still problems overflows with egad2-ll and egad3-ll. I tried checking overflows with CPAchecker and got violations:

scripts/cpa.sh -svcomp20 -heap 10000M -benchmark -timelimit '900 s' -64 -stats -spec ~/git/sv-benchmarks/c/properties/no-overflow.prp ~/git/sv-benchmarks/c/nla-digbench/egcd2.c -setprop solver.solver=SMTInterpol -setprop cpa.predicate.encodeBitvectorAs=INTEGER -setprop cpa.predicate.encodeFloatAs=RATIONAL

@MartinSpiessl told me the analysis might be unsound so I'm not sure. However, I get errors using Dartagnan in the scaled versions (nla-digbench-scaling) which might suggest there is undefined behaviour.

Nov 02 '20 17:11 hernanponcedeleon

please re-run your script in the new fixed benchmarks from nla-digbench?

I need to do so anyway to fix #1200 after the coverage property verdicts got merged first. I will probably open a new PR and close #1200 because the merge conflicts are not worth fixing if I can just regenerate everything from the generation script

@MartinSpiessl told me the analysis might be unsound so I'm not sure. However, I get errors using Dartagnan in the scaled versions (nla-digbench-scaling) which might suggest there is undefined behaviour.

Yes, linear integer arithmetic might be unsound, I will have a look.

Nov 10 '20 18:11 MartinSpiessl

sv-benchmarks sv-benchmarks copied to clipboard

Reachability benchmarks with overflow

sv-benchmarks
sv-benchmarks copied to clipboard