manticore
manticore copied to clipboard
Reduce the number of solver queries
There are several places where we do too many queries to the solvers.
For example, when generating a testcase, we do 7 queries to the solver, while the constraints are the same https://github.com/trailofbits/manticore/blob/3da969bf49c69c2a0f00088a96643d8b1b75264f/manticore/platforms/evm.py#L185-L191
I replaced this in dev-evm-experiments:
https://github.com/trailofbits/manticore/blob/f0d59ce0021994b9a9b6ea3df3ccfb4719426aad/manticore/platforms/evm.py#L195-L203
But I had to implement solve_one_n_optimized, where optimized meant that it allows querying the solver for multiple values with one query, instead of multiple one.
The diff can be seen between https://github.com/trailofbits/manticore/blob/f0d59ce0021994b9a9b6ea3df3ccfb4719426aad/manticore/core/smtlib/solver.py#L1006
And https://github.com/trailofbits/manticore/blob/f0d59ce0021994b9a9b6ea3df3ccfb4719426aad/manticore/core/smtlib/solver.py#L1100
However, the current implementation of _optimized is a bit too much redundant with the original get_value version.
We need to investigate:
- How to have a clean interface allowing to query the solver for multiple values with one query. I think @feliam was working on a
get_model, function that could be the solution here - Look on the EVM interactions with the solvers, and identify solvers queries that can be removed. Ideally we would have a
solve_statefunction, that would give all the variables at once, and be cached, so that we avoid duplicate queries
From the performance perspective:
, when generating a testcase, we do 7 queries to the solver, while the constraints are the same
With the configuration we very frequently use, this won't give a very large improvement because solve_one discard any concrete values, and most of these values are concrete (therefore the solver is not called). Still, not very optimal in general if you run manticore with a fully symbolic transactions (e.g. when sender/balance is symbolic).
How to have a clean interface allowing to query the solver for multiple values with one query. I think @feliam was working on a get_model, function that could be the solution here
I was under the impression that if we don't add any constraint, asking for value takes almost no time. So this should also won't give a very large improvement.
Look on the EVM interactions with the solvers, and identify solvers queries that can be removed. Ideally we would have a solve_state function, that would give all the variables at once, and be cached, so that we avoid duplicate queries
This is an interesting idea, if we can cache "generic" queries were we know the answers for them.
I was under the impression that if we don't add any constraint, asking for value takes almost no time. So this should also won't give a very large improvement.
Yes. The model is fixed (for example) in a check-sat and then the following get-values will take the values out of the last known model. AFAIK If you can be sure there are no new model changing constraints between two get-values (And no check-sat!) you can avoid doing redundant stuff.
--
How to have a clean interface allowing to query the solver for multiple values with one query. I think @feliam was working on a get_model, function that could be the solution here"
get_model(): https://github.com/trailofbits/manticore/pull/1729/files#diff-7c098f0347b9f7d7d97944212fd9fb416999ce6414be4add9367a5fd20a88937R404
But I had to implement solve_one_n_optimized, where optimized meant that it allows querying the solver for multiple values with one query, instead of multiple one.
Another get_value for multiple expressions
https://github.com/trailofbits/manticore/pull/1729/files#diff-7c098f0347b9f7d7d97944212fd9fb416999ce6414be4add9367a5fd20a88937R640--
-- While having a faster no redundant generate-testcase is good and necesary note that there is no strong need to generate an input for no interesting states. The time it spends in the generate-testcase is the time it spends after you know there is a bug.
With the configuration we very frequently use, this won't give a very large improvement because solve_one discard any concrete values, and most of these values are concrete (therefore the solver is not called). Still, not very optimal in general if you run manticore with a fully symbolic transactions (e.g. when sender/balance is symbolic).
Well at least conc_data and conc_return_data should be symbolic in most of the cases right? So it's at least a x2 speed up per testcase generation. So if you have to generate 200 testcases, that still a significant speed up I think.
I was under the impression that if we don't add any constraint, asking for value takes almost no time. So this should also won't give a very large improvement.
Asking for values to the solver takes no time, but every call to solve_one reset the solver and call check-sat:
https://github.com/trailofbits/manticore/blob/3da969bf49c69c2a0f00088a96643d8b1b75264f/manticore/core/smtlib/solver.py#L627-L629
As far as I know, we don't have a solver-level API that keeps the underlying solver in a state of "I replied SAT, now you can query the values you want". This is only done within manticore/core/smtlib/solver.py.
Put in another word, every time you call Solver with any of the functions from:
https://github.com/trailofbits/manticore/blob/3da969bf49c69c2a0f00088a96643d8b1b75264f/manticore/core/smtlib/solver.py#L100-L163
You do a check-sat (unless I am mistaken)