gazer icon indicating copy to clipboard operation
gazer copied to clipboard

Support large int literals in Theta backend

Open hajduakos opened this issue 5 years ago • 7 comments

Previously Theta could only parse 32-bit integer literals, but with 1.7 (see #29) it should not be a limitation. We should check if Gazer has to be modified internally to be able to produce larger literals.

hajduakos avatar Aug 27 '20 15:08 hajduakos

Gazer converts integer type (short, long, signed, unsigned, etc.) variables in the given C code to int in the generated CFA. Theta 1.7 handles these as large int literals, so the new functionality seems to work without any internal change in Gazer.

AdamZsofi avatar Aug 28 '20 11:08 AdamZsofi

There is a nuance after 1<<63ull: it does not fit the range of int64_t (as it is used in IntLiteralExpr). So the only problem is constant unsigned long long values larger than that can fit in a signed long long.

I'd reopen this as a bug or create a new issue in Gazer, whichever is more appropriate :)

radl97 avatar Aug 28 '20 14:08 radl97

Reopening seems cleaner to me in this case.

AdamZsofi avatar Aug 28 '20 16:08 AdamZsofi

Thanks @radl97 for pointing this out. What could you imagine as a solution? Replacing Gazer's internal representation with some bigint (e.g. from boost)? Furthermore, is this also an issue for the BMC backend?

hajduakos avatar Aug 28 '20 18:08 hajduakos

LLVM has some solution, I think. For example APInteger maybe, which stores bitwidth with the value.

Note: I would skip Boost, as it is only used on the outer layers, mostly for filesystem handling.

Without giving much thought, I think BMC is affected when using Integer representations (instead of the default(?) bitvectors).

Disclaimer: Both my claims against BMC and Theta modules are untested, I've only seen how IntLitExpr stores integers, which led me to think the problem exists. Further investigation may show otherwise.

I gladly take up the quest, if you'd like :) (both verifying and solving the issue)

radl97 avatar Aug 28 '20 19:08 radl97

@radl97 if you feel like doing it, please go ahead. You can create a new branch and when you're done submit a PR. It would also be nice to come up with a simple test that fails currently, but will be fixed by the patch.

hajduakos avatar Aug 28 '20 21:08 hajduakos

@radl97 Indeed, we currently store large int literals in int64_t. I think we can handle this either with llvm::APSInt (like APInt, but also stores sign information) or boost::multiprecision. Both solutions are fine by me.

Note that you probably will have to update the expression hashing mechanism for int literals in GazerContextImpl.h, otherwise you might get a good number of undebuggable template errors :) The default implementation for primitive literals uses llvm::hash_value, probably we will need a specialization for llvm::hash_value and/or expr_hasher (if deciding equilitiy is not trivial).

sallaigy avatar Aug 29 '20 19:08 sallaigy