analyzer icon indicating copy to clipboard operation
analyzer copied to clipboard

Affine Equalities

Open MartinWehking opened this issue 3 years ago • 17 comments

This draft PR introduces an analysis for affine equalities (ref: Michael Karr , 1976) that can be run independently from the already existing apron analyses. Abstract states in the newly added domain are represented by structs containing a matrix and an apron environment. Matrices are modeled as proposed by Karr: Each variable is assigned to a column and each row represents a linear affine relationship that must hold at the corresponding program point. The apron environment is hereby used to organize the order of columns and variables. Furthermore, the apron analysis and its respective privatization modules have been modified to fit the needs for apron-independent relational analyses. A short list giving a quick (and perhaps not full) overview of what's been added/modified:

  • affineEqualityDomain - Contains the majority of code for the new domain.
  • sharedFunctions - Contains functions and modules that are shared among the original apronDomain and the new affineEqualityDomain
  • relationDomain - Provides interfaces/implementations that generalize the apronDomain and affineEqualityDomain.
  • relationAnalysis - Contains most of the implementation of the original apronDomain, but now solely operates with functions provided by relationDomain.
  • relationPriv- Has been modified to work with any domain that uses the functions provided relationDomain
  • An extended overflow handling inside relationAnalysis for expression assignments when overflows are assumed to occur. Since affine equalities can only keep track of integer bounds of expressions evaluating to definite constants, we now query the integer bounds information for expressions from other analysis. If an analysis returns bounds that are unequal to min and max of ikind , we can exclude the possibility that an overflow occurs and the abstract effect of the expression assignment can be used, i.e. we do not have to set the variable's value to top. Additionally, we now also refine after positive guards when overflows might occur and there is only one variable inside the expression and the expression is an equality constraint check (==). We check after the refinement if the new value of the variable is outside its integer bounds and if that is the case, either revert to the old state or set it to bottom.

The new analysis can be activated by the following parameter: --set ana.activated[+] affeq In order to compare affeq with the apron affine equalities implementation, the affeq analysis inside apron can be selected by the following parameter (when apron is activated): --set ana.apron.domain "affeq"

  • One can switch between the array and list matrices for the affine equality domain by the parameter: --set ana.affeq.matrix list/array

A more detailed descriptions of all files/modules and interfaces:

-affineEqualityDomain (contains mostly new code)**: Modules -VarManagement: functor parameterized by AbstractMatrix and AbstractVector. It defines the type t of the affine equality domain (a struct that contains an optional matrix and an apron environment) and provides the functions needed for handling variables (which are defined by RelationDomain.D2) such as add_vars remove_vars. Furthermore, it provides the function get_coeff_vec that parses an apron expression into a vector of coefficients if the apron expression has an affine form. -ExpressionBounds: functor that takes the same arguments as VarManagement. It's interface is SharedFunctions.ConvBounds and calculates the bounds of an apron expression. As it is specifically used for the new affine equality domain, it can only provide bounds if the expression contains known constants only and in that case, min and max are the same.

  • D2: functor that takes the same arguments as VarManagement. Provides the rest of the functions required by RelationDomain.D2 (including the lattice operations such as meet, join, etc.) and the main code for the new affine equality domain.

vectorMatrix Interfaces/module types

  • RatOps: Abstracts the functions of the Mpqf module for rationals from Apron that implements multi-precision rationals. One could later exchange "Mpqf" with a different module that provides the functions specified by this interface.
  • Vector: High-level abstraction of a vector.
  • Matrix: High-level abstraction of a matrix.
  • AbstractVector: functor that includes the signature of Vector and takes A:RatOps as argument .It substitutes the abstract type num (short for "numerical") of Vector with type t of module A. Some functions inside have the suffix _pt_with, which means that the function could have potential side effects
  • AbstractMatrix; A functor including sig of Matrix with arguments A:RatOps and V:AbstractVector. Similarly to AbstractVector, AbstractMatrix substitutes num of Matrix with type t of A:RatOps but also the abstract type vec with type t of a module V:AbstractVector. Some functions inside have the suffix _pt_with, which means that the function could have potential side effects

Modules

  • ConvenienceOps: A functor parameterized by RatOps . It provides more readable infix operators for the functions of RatOps. It is designed to be included by modules that make use of RatOps's functions
  • ListVector: AbstractVector with type t A.t List.t. Reuses most of the already existing List functions such as map2, rev, etc. and adds some for vector manipulations (e.g. insert_val remove_val...). None of its functions has side-effects
  • ListMatrix: AbstractMatrix with type V.t list. In contrast to ListVector, it does not include the functions of List directly but uses them to provide unique ones. It also includes various normalize functions that transform a matrix into reduced row echelon form. These are in general not efficient as they perform a full Gauss-Jordan elimination that is slow for lists.
  • ArrayVector: AbstractVector with type t A.t array (from the Batteries library). Provides more efficient side-effecting functions than ListVector. Provides a copy function since the array data structure it contains is mutable.
  • ArrayMatrix: AbstractMatrix with type A.t array array (from the Batteries library). Provides more efficient side-effecting functions than ListMatrix. Similar to ListMatrix, it provides a normalization function to reduce a matrix into reduced row echelon form. However, its normalization functions are faster not only because they use faster side-effect functions but also because they exploit that the input matrix/matrices are in reduced row echelon form already

ApronDomain (Contains most of its old code. Some parts have been moved. Not much new has been added) Changes:

  • AffeqManager has been added which makes it possible to run the apron affine equality implementation as well
  • Var has been moved to sharedFunctions as it is also used by affineEqualityDomain now.
  • VarMetadataTbl and VM were moved to relationDomain.
  • *Convert has been moved to sharedFunctions as it converts CIL expressions to apron expressions which is also used by affineEqualityDomain
  • A few code elements for environment changes from functions as remove_vars etc. have been moved to sharedFunctions as they are needed in a similar way inside affineEqualityDomain
  • ov parameter has been added to functions where functions of the Convert module are used. This is to also to make used of the improved overflow handling.
  • assert_inv, eval_interval_expr and eval_int have been moved into a new AssertionModule inside SharedFunctions as they are also used by affineEqualityDomain
  • ApronComponents was renamed to RelComponent and moved to RelationDomain
  • D2Complete and D2 have been added (containing functions of RelationDomain.D2).
  • AD2Complete was added which includes the functions of SharedFunctions.AssertionModule and D2Complete. It contains all functions specified by RelationDomain.RelD2 . AD2Complete is required by ApronPrecCompareUtil

RelationDomain (Contains some of ApronDomain's code and new interfaces)

New module types/ Interfaces

  • RelVar which abstracts the extended apron Var that was inside apronDomain (Moved to sharedFunctions).
  • D2 which defines most of the functions that are needed by RelationAnalysis. It includes for instance lattice functions such as join, meet ... Some of the functions end with the suffix _pt_with which indicates that the function could have potential side effects. In that case, the functions are supposed to return their modified input.
  • RelD2 which includes D2 but extends it by a few functions. This signature is used by the AssertionModule in `sharedFunctions'
  • RD which serves as a parameter for RelationAnalysis.SpecFunctor. The 3 modules that it contains must be implemented for each analysis and bundled into a module of this type. It is intended to group the implementation for variables and the domain.

Moved modules + module types/ interfaces

  • RelVM (VM before) plus its interface VarMetadata, VarMetadataTbl and V were moved from ApronDomain into RelationDomain. Functor arguments for RelVar have been added.
  • RelComponent (before:ApronComponents) was also moved here from ApronDomain and accepts RelD2 now as functor argument instead of ApronDomain.S2

SharedFunctions (Contains mostly code from apronDomain which has been adapted to fit to affineEqualityDomain`) Moved modules + module types/interfaces

  • Tracked from apronDomain
  • Convert from apronDomain. It takes a module of type ConvBounds as functor argument now. Used to convert CIL expressions to apron expressions.

New module types/interfaces

  • ConvBounds an interface for Bounds which calculates bounds for expressions and is used inside the - Convert module
  • AssertionRelD2 : a more specific module type for RelationDomain.RelD2 with ConvBounds integrated and various apron elements. It is designed to be the interface for the D2 modules in affineEqualityDomain and apronDomain and serves as a functor argument for AssertionModule

New modules

  • EnvOps: A module that includes various methods used by variable handling operations such as add_vars, remove_vars etc. in apronDomain and affineEqualityDomain
  • AssertionModule: Contains functions that are required by RelationAnalysis and defined by RelationDomain.RelD2 and are exactly the same for apronDomain and affineEqualityDomain , i.e. they don`t have to be defined twice inside both domains. It uses the functions of its functor argument AssertionRelD2

RelationAnalysis (Contains the SpecFunctor of apronDomain)

  • SpecFunctor accepts RelationDomain.RD now instead of ApronDomain.S2 and uses the domain and variable provided by it. It also accepts RelationPrecCompareUtil.Util now as functor argument as ApronPrecCompareUtil usages inside have been removed. Some of the definitely side-effecting functions ending with _with are now expected to only potentially have side effects and their return values are therefore utilized.

ApronAnalysis

  • ExtendedSpecFunctor: Uses the SpecFunctor defined inside RelationAnalysis and extends it by data storing methods etc. It also passes ApronPrecCompareUtil to RelationAnalysis.SpecFunctor.
  • Initialization methods (such as spec_module, get-spec) for apron analyses are also included and taken from the old apronAnalysis. They were slightly adapted to the new module structure (e.g. module RD with apron domain etc. are initialized inside)

AffineEqualityAnalysis

  • Similar to ApronAnalysis, initialization methods for the new affine equality analysis were added. However, it does not add specific data storing methods to RelationAnalysis.SpecFunctor and passes RelationPrecCompareUtil.DummyUtil instead of an apron-specific implementation to the SpecFunctor

RelationPrecCompareUtil and ApronPrecCompareUtil

  • RelationPrecCompareUtil.Util abstracts ApronPrecCompareUtil now and also provides a DummyUtil for the new affine equality analysis

MartinWehking avatar Feb 08 '22 16:02 MartinWehking

can be run independently from the already existing apron analyses.

Abstract states in the newly added domain are represented by structs containing a matrix and an apron environment.

I am immediately confused by this. If the new affeq domain/analysis uses Apron environments, variables and expressions, then it's not really independent from Apron. If Apron isn't installed, then the affeq domain cannot even be compiled.

To me, this is a particularly weird choice of abstraction, because it would be more logical to use non-Apron variables, expressions and environments (like the base analysis does) for the domain, which intends to be separate from Apron's. I would've expected the common interface between Apron domains and this affeq domain to use Cil.varinfo, Cil.exp, etc.

By still requiring the presence of Apron, the point of having this domain is defeated, because if I already have Apron, I might as well use the equivalent equalities domain provided by it. That's very likely more efficient and more reliable.

sim642 avatar Feb 08 '22 17:02 sim642

can be run independently from the already existing apron analyses.

Abstract states in the newly added domain are represented by structs containing a matrix and an apron environment.

I am immediately confused by this. If the new affeq domain/analysis uses Apron environments, variables and expressions, then it's not really independent from Apron. If Apron isn't installed, then the affeq domain cannot even be compiled.

To me, this is a particularly weird choice of abstraction, because it would be more logical to use non-Apron variables, expressions and environments (like the base analysis does) for the domain, which intends to be separate from Apron's. I would've expected the common interface between Apron domains and this affeq domain to use Cil.varinfo, Cil.exp, etc.

By still requiring the presence of Apron, the point of having this domain is defeated, because if I already have Apron, I might as well use the equivalent equalities domain provided by it. That's very likely more efficient and more reliable.

@sim642 The goal is not to get completely rid of apron. We rather tried to have an analysis that can be run in parallel to apron/other analyses. Currently, it's for example not possible to run the polyhedra and affeq analysis simultaneously. That's what we wanted to change. Plus, we"re also trying more or less to compare this implementation to the apron affeq analysis. The apron-independent interfaces that are provided can be seen as a nice-to-have feature so to speak. In future, one could add more relational analyses that use these interfaces without necessarily relying on apron, but this is beyond the scope of this specific project. It would also be possible to remove the few apron components that are used inside affeq domain/analysis, but again, this would be future work.

MartinWehking avatar Feb 09 '22 13:02 MartinWehking

Currently, it's for example not possible to run the polyhedra and affeq analysis simultaneously. That's what we wanted to change.

Currently (on master) there's no affeq to begin with, so I guess you meant polyhedra (Polka.manager_alloc_loose) and equalities (Polka.manager_alloc_equalities)? That doesn't make sense either, because loose automatically can represent all equalities as well, so you wouldn't need to use them simultaneously.

A combination like octagon and equalities wouldn't fall under that, so isn't currently directly supported. But achieving that doesn't require implementing one of the two from scratch! Thanks to the genericity of ApronAnalyis.SpecFunctor, it could simply be instantiated twice with different Apron domains. The two registered analyses could automagically run simultaneously already.

Alternatively, Apron has (reduced) products of its domains, which could achieve the same.

sim642 avatar Feb 09 '22 14:02 sim642

Currently, it's for example not possible to run the polyhedra and affeq analysis simultaneously. That's what we wanted to change.

Currently (on master) there's no affeq to begin with, so I guess you meant polyhedra (Polka.manager_alloc_loose) and equalities (Polka.manager_alloc_equalities)? That doesn't make sense either, because loose automatically can represent all equalities as well, so you wouldn't need to use them simultaneously.

A combination like octagon and equalities wouldn't fall under that, so isn't currently directly supported. But achieving that doesn't require implementing one of the two from scratch! Thanks to the genericity of ApronAnalyis.SpecFunctor, it could simply be instantiated twice with different Apron domains. The two registered analyses could automagically run simultaneously already.

Alternatively, Apron has (reduced) products of its domains, which could achieve the same.

True, there is indeed no affeq on master at the moment but adding it required just a few more lines of code. Yes, running affine equalities and polyhedra simultaneously doesn't make that much sense result-wise since polyhedra covers affeq as you said. For checking runtimes or other properties it could be interesting though.

Having a completely apron-free implementation also didn't make sense for us since we wanted to keep the apron abstract syntax tree plus its conversion functions. For comparing our affeq implementation with the apron one this was somewhat crucial as we didn't want ours to suffer from wrong/incomplete CIL expression handling. Thus, keeping it also gave us a better common basis for the comparison. In future one can decide if they want to reuse the AST + conversion functions from apron as well or leave it out entirely for implementations of other relational domains. Hope this gave a better justification for the existence of the new affeq analysis/domain

Do you perhaps have some remarks on the code as well? I would be really glad to get as much feedback as possible since the implementation is an integral part of my thesis.

MartinWehking avatar Feb 18 '22 18:02 MartinWehking

Why does this PR update g2html? Were there any necessary changes, or is this by mistake?

michael-schwarz avatar Feb 19 '22 10:02 michael-schwarz

This is a huge diff (by necessity) and that makes it very hard to review. Could you maybe provide an overview over the different modules (and module types) that are now used (there was a lot of them before already), and I think we now have more that are even more involved?

This would be similar in spirit to the overview over the different modules you give in the description of the PR, but at a higher level of detail. Also, it would be very helpful to indicate which files contain significant new code, and which contain code that has been moved from somewhere else. I think that would make it easier (for all of us) to review the PR in detail in detail and also offer more specific suggestions for some of the parts.

michael-schwarz avatar Feb 19 '22 11:02 michael-schwarz

This is a huge diff (by necessity) and that makes it very hard to review. Could you maybe provide an overview over the different modules (and module types) that are now used (there was a lot of them before already), and I think we now have more that are even more involved?

This would be similar in spirit to the overview over the different modules you give in the description of the PR, but at a higher level of detail. Also, it would be very helpful to indicate which files contain significant new code, and which contain code that has been moved from somewhere else. I think that would make it easier (for all of us) to review the PR in detail in detail and also offer more specific suggestions for some of the parts.

I've added a more detailed description to my initial comment.

MartinWehking avatar Feb 26 '22 13:02 MartinWehking

Having a completely apron-free implementation also didn't make sense for us since we wanted to keep the apron abstract syntax tree plus its conversion functions. For comparing our affeq implementation with the apron one this was somewhat crucial as we didn't want ours to suffer from wrong/incomplete CIL expression handling. Thus, keeping it also gave us a better common basis for the comparison.

But the Apron texpr1 that we get from CIL's exp is extremely close: https://github.com/goblint/analyzer/blob/4c7576af93584aae20ce7a7ba6f3edc34a378fa7/src/cdomains/apron/apronDomain.apron.ml#L158-L200 All it would've taken to not depend on Apron was in the affeq implementation to pattern match on CIL's BinOp (PlusA, ...) instead of Apron's Binop (Add, ...).

In future one can decide if they want to reuse the AST + conversion functions from apron as well or leave it out entirely for implementations of other relational domains.

I guess there's not much to be done about it at this point, but it's a weird early design decision that now is deeply integrated in this. It would require major refactoring now to change.

sim642 avatar Mar 02 '22 10:03 sim642

  • Another major change is that functions with side effects (usually ending with "_with") defined inside apronDomain were replaced by functional equivalents

This point is easily missed when looking at the diffs because a rename is also involved, but this is a no-go. The in-place functions of Apron are a feature and we very intentionally use them to achieve better performance. Because the functional equivalents always involve a memory copy and then do that in-place operation on the new memory.

There's a reason Apron provides these. By reverting that optimization you're (maybe unintentionally) shooting the Apron domains in the foot and crippling their performance, which invalidates the core goal of these changes to compare the two.

sim642 avatar Mar 02 '22 12:03 sim642

  • implementations for vectors and matrices (realized through lists and lists of lists. Could be replaced in future

This is also a major point when it comes to performance and its comparison. Functional lists are infamously inefficient for the kinds of operations vectors and matrices are used for here (indexing). If I were to guess, these list manipulations make up a major bottleneck.

sim642 avatar Mar 02 '22 12:03 sim642

  • Another major change is that functions with side effects (usually ending with "_with") defined inside apronDomain were replaced by functional equivalents

This point is easily missed when looking at the diffs because a rename is also involved, but this is a no-go. The in-place functions of Apron are a feature and we very intentionally use them to achieve better performance. Because the functional equivalents always involve a memory copy and then do that in-place operation on the new memory.

There's a reason Apron provides these. By reverting that optimization you're (maybe unintentionally) shooting the Apron domains in the foot and crippling their performance, which invalidates the core goal of these changes to compare the two.

That makes sense. We've added the in-place functions to relationAnalysis again and indicated by a suffix "_pt_with" that the functions could have potential side-effects (which is the case for all "_pt_with" labeled functions in the apron domain but not for the ones inside affeq) 236b3876260bc3b05e7e9891cf63bfceafab1eb1

MartinWehking avatar Mar 28 '22 14:03 MartinWehking

  • implementations for vectors and matrices (realized through lists and lists of lists. Could be replaced in future

This is also a major point when it comes to performance and its comparison. Functional lists are infamously inefficient for the kinds of operations vectors and matrices are used for here (indexing). If I were to guess, these list manipulations make up a major bottleneck.

Okay a quick update on this. We have added a matrix data structure that relies on the array implementation provided by the Batteries library that outperforms the list version by far. The latter was indeed a major bottleneck inside our implementation, but we were also expecting that of course. What also harmed our runtime was the extremely slow normalization function which was called inside leq for instance. A function that is used quite often... There was also this idea of using dynamic matrices from Batteries. However, we eventually decided against it since most functions actually need an array copying one way or another. For instance removing a variable and resizing the input array could potentially break something in the relationAnalysis as it expects the input to remain unchanged (side effects...). One of the few functions where it could make sense is the join operation as it involves multiple recursion steps that reshape remove rows quite often. We have inspected time stats, however and saw that usually the join operation does not harm the performance that much.

MartinWehking avatar May 26 '22 19:05 MartinWehking

Might make sense to also add a regression test for such things, such that we'll notice if we inadvertently break it again in the future!

56b4125b384ef9a89f249f65e643ba036153a236

I've manually reduced one the smallest SV-Comp test where the mistake occured. The test fails one commit before the fix. After the fix commit it works perfectly fine of course.

MartinWehking avatar Jun 03 '22 11:06 MartinWehking

Why does this PR update g2html? Were there any necessary changes, or is this by mistake?

I've reverted the updates to all submodules. These were indeed by mistake 54907736c4d93fd9e4e35d2870839d394ef0f30a and a514d846864520867bdf596b29bca574b5a881fe

MartinWehking avatar Jun 15 '22 08:06 MartinWehking

Could you merge master into this one last time? Then, I think I'll start another run on sv-comp and then we should be good to go!

michael-schwarz avatar Jul 13 '22 07:07 michael-schwarz

I have not been able to do a complete run of Octagons before and after because benchexec froze, but there are some interesting issues: After this change, there are

  • 126 new instances of [EXCEPTION (Cilfacade.TypeOfError)]
  • 3 new [EXCEPTION (Apron.Manager.Error)]
  • 4 new [EXCEPTION (Goblint_lib.IntDomain.IncompatibleIKinds)]
  • 6 new [EXCEPTION (Invalid_argument)]
  • 2 new unsoundness cases

I started a new run in the hopes that that will not freeze. We should however still take a look at these new issues.

diff.zip

michael-schwarz avatar Oct 04 '22 07:10 michael-schwarz

The run of octagons before and after finished now, here are the results.

results.zip

michael-schwarz avatar Oct 06 '22 11:10 michael-schwarz

After the fixes, most issues are fixed. comparison.zip

I think we should probably merge this to avoid to merge again and again, and also to make sure we don't do the merge shortly before the sv-comp deadline which may be risky. What do you think @sim642?

michael-schwarz avatar Oct 23 '22 10:10 michael-schwarz

Indeed, we shouldn't make such significant changes right before SV-COMP. Based on a quick look at the related thesis, it seems like in SV-COMP this doesn't give us an advantage over octagons both in time and score, right?

Since this involves quite big changes (that are hard to follow on diffs alone), I'd want to take a good look at this again, because it has been a while. Instead of rushing this now, it seems slightly more reasonable to me to have this merged right after SV-COMP. I'm then willing to take care of any merge conflicts myself.

sim642 avatar Oct 24 '22 07:10 sim642

@sim642: For reference, this is Martin's thesis in case you want to look something up (e.g. performance numbers): https://versioncontrolseidl.in.tum.de/teaching/goblint-related-ba-or-ma-theses/-/blob/master/MA_2022_Implementing-and-Comparing-Linear-Relational-Domains_Martin-Wehking.pdf

michael-schwarz avatar Nov 21 '22 13:11 michael-schwarz

@michael-schwarz I think I've now done all the tinkering on this that I intended, so you can also have a final look.

Just a reminder here at the end again: this is to be squash merged.

sim642 avatar Dec 01 '22 12:12 sim642

I guess we'll also need to make changes in https://github.com/goblint/bench now that this is merged.

michael-schwarz avatar Dec 18 '22 14:12 michael-schwarz