analyzer
analyzer copied to clipboard
Affine Equalities
This draft PR introduces an analysis for affine equalities (ref: Michael Karr , 1976) that can be run independently from the already existing apron analyses. Abstract states in the newly added domain are represented by structs containing a matrix and an apron environment. Matrices are modeled as proposed by Karr: Each variable is assigned to a column and each row represents a linear affine relationship that must hold at the corresponding program point. The apron environment is hereby used to organize the order of columns and variables. Furthermore, the apron analysis and its respective privatization modules have been modified to fit the needs for apron-independent relational analyses. A short list giving a quick (and perhaps not full) overview of what's been added/modified:
- affineEqualityDomain - Contains the majority of code for the new domain.
- sharedFunctions - Contains functions and modules that are shared among the original apronDomain and the new affineEqualityDomain
- relationDomain - Provides interfaces/implementations that generalize the apronDomain and affineEqualityDomain.
- relationAnalysis - Contains most of the implementation of the original apronDomain, but now solely operates with functions provided by relationDomain.
- relationPriv- Has been modified to work with any domain that uses the functions provided relationDomain
- An extended overflow handling inside relationAnalysis for expression assignments when overflows are assumed to occur. Since affine equalities can only keep track of integer bounds of expressions evaluating to definite constants, we now query the integer bounds information for expressions from other analysis. If an analysis returns bounds that are unequal to min and max of ikind , we can exclude the possibility that an overflow occurs and the abstract effect of the expression assignment can be used, i.e. we do not have to set the variable's value to top. Additionally, we now also refine after positive guards when overflows might occur and there is only one variable inside the expression and the expression is an equality constraint check (==). We check after the refinement if the new value of the variable is outside its integer bounds and if that is the case, either revert to the old state or set it to bottom.
The new analysis can be activated by the following parameter:
--set ana.activated[+] affeq
In order to compare affeq with the apron affine equalities implementation, the affeq analysis inside apron can be selected by the following parameter (when apron is activated):
--set ana.apron.domain "affeq"
- One can switch between the array and list matrices for the affine equality domain by the parameter:
--set ana.affeq.matrix list/array
A more detailed descriptions of all files/modules and interfaces:
-affineEqualityDomain (contains mostly new code)**:
Modules
-VarManagement: functor parameterized by AbstractMatrix and AbstractVector. It defines the type t of the affine equality domain (a struct that contains an optional matrix and an apron environment) and provides the functions needed for handling variables (which are defined by RelationDomain.D2) such as add_vars remove_vars. Furthermore, it provides the function get_coeff_vec that parses an apron expression into a vector of coefficients if the apron expression has an affine form.
-ExpressionBounds: functor that takes the same arguments as VarManagement. It's interface is SharedFunctions.ConvBounds and calculates the bounds of an apron expression. As it is specifically used for the new affine equality domain, it can only provide bounds if the expression contains known constants only and in that case, min and max are the same.
- D2: functor that takes the same arguments as VarManagement. Provides the rest of the functions required by
RelationDomain.D2(including the lattice operations such asmeet,join, etc.) and the main code for the new affine equality domain.
vectorMatrix Interfaces/module types
- RatOps: Abstracts the functions of the
Mpqfmodule for rationals from Apron that implements multi-precision rationals. One could later exchange "Mpqf" with a different module that provides the functions specified by this interface. - Vector: High-level abstraction of a vector.
- Matrix: High-level abstraction of a matrix.
- AbstractVector: functor that includes the signature of
Vectorand takesA:RatOpsas argument .It substitutes the abstract type num (short for "numerical") ofVectorwith type t of moduleA. Some functions inside have the suffix _pt_with, which means that the function could have potential side effects - AbstractMatrix; A functor including sig of Matrix with arguments
A:RatOpsandV:AbstractVector. Similarly to AbstractVector, AbstractMatrix substitutesnumofMatrixwith type t ofA:RatOpsbut also the abstract type vec with type t of a moduleV:AbstractVector. Some functions inside have the suffix _pt_with, which means that the function could have potential side effects
Modules
- ConvenienceOps: A functor parameterized by
RatOps. It provides more readable infix operators for the functions ofRatOps. It is designed to be included by modules that make use ofRatOps's functions - ListVector:
AbstractVectorwith type tA.t List.t. Reuses most of the already existingListfunctions such asmap2,rev, etc. and adds some for vector manipulations (e.g.insert_valremove_val...). None of its functions has side-effects - ListMatrix:
AbstractMatrixwith typeV.t list. In contrast toListVector, it does not include the functions ofListdirectly but uses them to provide unique ones. It also includes variousnormalizefunctions that transform a matrix into reduced row echelon form. These are in general not efficient as they perform a full Gauss-Jordan elimination that is slow for lists. - ArrayVector:
AbstractVectorwith type tA.t array(from theBatterieslibrary). Provides more efficient side-effecting functions thanListVector. Provides a copy function since the array data structure it contains is mutable. - ArrayMatrix:
AbstractMatrixwith typeA.t array array(from theBatterieslibrary). Provides more efficient side-effecting functions thanListMatrix. Similar toListMatrix, it provides a normalization function to reduce a matrix into reduced row echelon form. However, its normalization functions are faster not only because they use faster side-effect functions but also because they exploit that the input matrix/matrices are in reduced row echelon form already
ApronDomain (Contains most of its old code. Some parts have been moved. Not much new has been added) Changes:
- AffeqManager has been added which makes it possible to run the apron affine equality implementation as well
- Var has been moved to sharedFunctions as it is also used by affineEqualityDomain now.
- VarMetadataTbl and VM were moved to relationDomain.
- *Convert has been moved to sharedFunctions as it converts CIL expressions to apron expressions which is also used by
affineEqualityDomain - A few code elements for environment changes from functions as remove_vars etc. have been moved to sharedFunctions as they are needed in a similar way inside
affineEqualityDomain ovparameter has been added to functions where functions of the Convert module are used. This is to also to make used of the improved overflow handling.assert_inv,eval_interval_exprandeval_inthave been moved into a newAssertionModuleinsideSharedFunctionsas they are also used byaffineEqualityDomain- ApronComponents was renamed to
RelComponentand moved to RelationDomain - D2Complete and D2 have been added (containing functions of
RelationDomain.D2). - AD2Complete was added which includes the functions of
SharedFunctions.AssertionModuleand D2Complete. It contains all functions specified byRelationDomain.RelD2. AD2Complete is required byApronPrecCompareUtil
RelationDomain (Contains some of ApronDomain's code and new interfaces)
New module types/ Interfaces
- RelVar which abstracts the extended apron
Varthat was insideapronDomain(Moved tosharedFunctions). - D2 which defines most of the functions that are needed by
RelationAnalysis. It includes for instance lattice functions such asjoin,meet... Some of the functions end with the suffix _pt_with which indicates that the function could have potential side effects. In that case, the functions are supposed to return their modified input. - RelD2 which includes D2 but extends it by a few functions. This signature is used by the
AssertionModulein `sharedFunctions' - RD which serves as a parameter for
RelationAnalysis.SpecFunctor. The 3 modules that it contains must be implemented for each analysis and bundled into a module of this type. It is intended to group the implementation for variables and the domain.
Moved modules + module types/ interfaces
- RelVM (VM before) plus its interface VarMetadata, VarMetadataTbl and V were moved from
ApronDomainintoRelationDomain. Functor arguments for RelVar have been added. - RelComponent (before:ApronComponents) was also moved here from
ApronDomainand accepts RelD2 now as functor argument instead of ApronDomain.S2
SharedFunctions (Contains mostly code from apronDomain which has been adapted to fit to affineEqualityDomain`)
Moved modules + module types/interfaces
- Tracked from
apronDomain - Convert from
apronDomain. It takes a module of type ConvBounds as functor argument now. Used to convert CIL expressions to apron expressions.
New module types/interfaces
- ConvBounds an interface for Bounds which calculates bounds for expressions and is used inside the - Convert module
- AssertionRelD2 : a more specific module type for
RelationDomain.RelD2with ConvBounds integrated and various apron elements. It is designed to be the interface for the D2 modules inaffineEqualityDomainandapronDomainand serves as a functor argument for AssertionModule
New modules
- EnvOps: A module that includes various methods used by variable handling operations such as
add_vars,remove_varsetc. inapronDomainandaffineEqualityDomain - AssertionModule: Contains functions that are required by
RelationAnalysisand defined byRelationDomain.RelD2and are exactly the same forapronDomainandaffineEqualityDomain, i.e. they don`t have to be defined twice inside both domains. It uses the functions of its functor argument AssertionRelD2
RelationAnalysis (Contains the SpecFunctor of apronDomain)
- SpecFunctor accepts
RelationDomain.RDnow instead ofApronDomain.S2and uses the domain and variable provided by it. It also acceptsRelationPrecCompareUtil.Utilnow as functor argument asApronPrecCompareUtilusages inside have been removed. Some of the definitely side-effecting functions ending with _with are now expected to only potentially have side effects and their return values are therefore utilized.
ApronAnalysis
- ExtendedSpecFunctor: Uses the SpecFunctor defined inside
RelationAnalysisand extends it by data storing methods etc. It also passesApronPrecCompareUtiltoRelationAnalysis.SpecFunctor. - Initialization methods (such as
spec_module,get-spec) for apron analyses are also included and taken from the oldapronAnalysis. They were slightly adapted to the new module structure (e.g. module RD with apron domain etc. are initialized inside)
AffineEqualityAnalysis
- Similar to
ApronAnalysis, initialization methods for the new affine equality analysis were added. However, it does not add specific data storing methods toRelationAnalysis.SpecFunctorand passesRelationPrecCompareUtil.DummyUtilinstead of an apron-specific implementation to theSpecFunctor
RelationPrecCompareUtil and ApronPrecCompareUtil
- RelationPrecCompareUtil.Util abstracts ApronPrecCompareUtil now and also provides a DummyUtil for the new affine equality analysis
can be run independently from the already existing apron analyses.
Abstract states in the newly added domain are represented by structs containing a matrix and an apron environment.
I am immediately confused by this. If the new affeq domain/analysis uses Apron environments, variables and expressions, then it's not really independent from Apron. If Apron isn't installed, then the affeq domain cannot even be compiled.
To me, this is a particularly weird choice of abstraction, because it would be more logical to use non-Apron variables, expressions and environments (like the base analysis does) for the domain, which intends to be separate from Apron's. I would've expected the common interface between Apron domains and this affeq domain to use Cil.varinfo, Cil.exp, etc.
By still requiring the presence of Apron, the point of having this domain is defeated, because if I already have Apron, I might as well use the equivalent equalities domain provided by it. That's very likely more efficient and more reliable.
can be run independently from the already existing apron analyses.
Abstract states in the newly added domain are represented by structs containing a matrix and an apron environment.
I am immediately confused by this. If the new affeq domain/analysis uses Apron environments, variables and expressions, then it's not really independent from Apron. If Apron isn't installed, then the affeq domain cannot even be compiled.
To me, this is a particularly weird choice of abstraction, because it would be more logical to use non-Apron variables, expressions and environments (like the base analysis does) for the domain, which intends to be separate from Apron's. I would've expected the common interface between Apron domains and this affeq domain to use
Cil.varinfo,Cil.exp, etc.By still requiring the presence of Apron, the point of having this domain is defeated, because if I already have Apron, I might as well use the equivalent equalities domain provided by it. That's very likely more efficient and more reliable.
@sim642 The goal is not to get completely rid of apron. We rather tried to have an analysis that can be run in parallel to apron/other analyses. Currently, it's for example not possible to run the polyhedra and affeq analysis simultaneously. That's what we wanted to change. Plus, we"re also trying more or less to compare this implementation to the apron affeq analysis. The apron-independent interfaces that are provided can be seen as a nice-to-have feature so to speak. In future, one could add more relational analyses that use these interfaces without necessarily relying on apron, but this is beyond the scope of this specific project. It would also be possible to remove the few apron components that are used inside affeq domain/analysis, but again, this would be future work.
Currently, it's for example not possible to run the polyhedra and affeq analysis simultaneously. That's what we wanted to change.
Currently (on master) there's no affeq to begin with, so I guess you meant polyhedra (Polka.manager_alloc_loose) and equalities (Polka.manager_alloc_equalities)? That doesn't make sense either, because loose automatically can represent all equalities as well, so you wouldn't need to use them simultaneously.
A combination like octagon and equalities wouldn't fall under that, so isn't currently directly supported. But achieving that doesn't require implementing one of the two from scratch! Thanks to the genericity of ApronAnalyis.SpecFunctor, it could simply be instantiated twice with different Apron domains. The two registered analyses could automagically run simultaneously already.
Alternatively, Apron has (reduced) products of its domains, which could achieve the same.
Currently, it's for example not possible to run the polyhedra and affeq analysis simultaneously. That's what we wanted to change.
Currently (on
master) there's no affeq to begin with, so I guess you meant polyhedra (Polka.manager_alloc_loose) and equalities (Polka.manager_alloc_equalities)? That doesn't make sense either, becauselooseautomatically can represent all equalities as well, so you wouldn't need to use them simultaneously.A combination like octagon and equalities wouldn't fall under that, so isn't currently directly supported. But achieving that doesn't require implementing one of the two from scratch! Thanks to the genericity of
ApronAnalyis.SpecFunctor, it could simply be instantiated twice with different Apron domains. The two registered analyses could automagically run simultaneously already.Alternatively, Apron has (reduced) products of its domains, which could achieve the same.
True, there is indeed no affeq on master at the moment but adding it required just a few more lines of code. Yes, running affine equalities and polyhedra simultaneously doesn't make that much sense result-wise since polyhedra covers affeq as you said. For checking runtimes or other properties it could be interesting though.
Having a completely apron-free implementation also didn't make sense for us since we wanted to keep the apron abstract syntax tree plus its conversion functions. For comparing our affeq implementation with the apron one this was somewhat crucial as we didn't want ours to suffer from wrong/incomplete CIL expression handling. Thus, keeping it also gave us a better common basis for the comparison. In future one can decide if they want to reuse the AST + conversion functions from apron as well or leave it out entirely for implementations of other relational domains. Hope this gave a better justification for the existence of the new affeq analysis/domain
Do you perhaps have some remarks on the code as well? I would be really glad to get as much feedback as possible since the implementation is an integral part of my thesis.
Why does this PR update g2html? Were there any necessary changes, or is this by mistake?
This is a huge diff (by necessity) and that makes it very hard to review. Could you maybe provide an overview over the different modules (and module types) that are now used (there was a lot of them before already), and I think we now have more that are even more involved?
This would be similar in spirit to the overview over the different modules you give in the description of the PR, but at a higher level of detail. Also, it would be very helpful to indicate which files contain significant new code, and which contain code that has been moved from somewhere else. I think that would make it easier (for all of us) to review the PR in detail in detail and also offer more specific suggestions for some of the parts.
This is a huge diff (by necessity) and that makes it very hard to review. Could you maybe provide an overview over the different modules (and module types) that are now used (there was a lot of them before already), and I think we now have more that are even more involved?
This would be similar in spirit to the overview over the different modules you give in the description of the PR, but at a higher level of detail. Also, it would be very helpful to indicate which files contain significant new code, and which contain code that has been moved from somewhere else. I think that would make it easier (for all of us) to review the PR in detail in detail and also offer more specific suggestions for some of the parts.
I've added a more detailed description to my initial comment.
Having a completely apron-free implementation also didn't make sense for us since we wanted to keep the apron abstract syntax tree plus its conversion functions. For comparing our affeq implementation with the apron one this was somewhat crucial as we didn't want ours to suffer from wrong/incomplete CIL expression handling. Thus, keeping it also gave us a better common basis for the comparison.
But the Apron texpr1 that we get from CIL's exp is extremely close: https://github.com/goblint/analyzer/blob/4c7576af93584aae20ce7a7ba6f3edc34a378fa7/src/cdomains/apron/apronDomain.apron.ml#L158-L200
All it would've taken to not depend on Apron was in the affeq implementation to pattern match on CIL's BinOp (PlusA, ...) instead of Apron's Binop (Add, ...).
In future one can decide if they want to reuse the AST + conversion functions from apron as well or leave it out entirely for implementations of other relational domains.
I guess there's not much to be done about it at this point, but it's a weird early design decision that now is deeply integrated in this. It would require major refactoring now to change.
- Another major change is that functions with side effects (usually ending with "_with") defined inside
apronDomainwere replaced by functional equivalents
This point is easily missed when looking at the diffs because a rename is also involved, but this is a no-go. The in-place functions of Apron are a feature and we very intentionally use them to achieve better performance. Because the functional equivalents always involve a memory copy and then do that in-place operation on the new memory.
There's a reason Apron provides these. By reverting that optimization you're (maybe unintentionally) shooting the Apron domains in the foot and crippling their performance, which invalidates the core goal of these changes to compare the two.
- implementations for vectors and matrices (realized through lists and lists of lists. Could be replaced in future
This is also a major point when it comes to performance and its comparison. Functional lists are infamously inefficient for the kinds of operations vectors and matrices are used for here (indexing). If I were to guess, these list manipulations make up a major bottleneck.
- Another major change is that functions with side effects (usually ending with "_with") defined inside
apronDomainwere replaced by functional equivalentsThis point is easily missed when looking at the diffs because a rename is also involved, but this is a no-go. The in-place functions of Apron are a feature and we very intentionally use them to achieve better performance. Because the functional equivalents always involve a memory copy and then do that in-place operation on the new memory.
There's a reason Apron provides these. By reverting that optimization you're (maybe unintentionally) shooting the Apron domains in the foot and crippling their performance, which invalidates the core goal of these changes to compare the two.
That makes sense. We've added the in-place functions to relationAnalysis again and indicated by a suffix "_pt_with" that the functions could have potential side-effects (which is the case for all "_pt_with" labeled functions in the apron domain but not for the ones inside affeq) 236b3876260bc3b05e7e9891cf63bfceafab1eb1
- implementations for vectors and matrices (realized through lists and lists of lists. Could be replaced in future
This is also a major point when it comes to performance and its comparison. Functional lists are infamously inefficient for the kinds of operations vectors and matrices are used for here (indexing). If I were to guess, these list manipulations make up a major bottleneck.
Okay a quick update on this. We have added a matrix data structure that relies on the array implementation provided by the Batteries library that outperforms the list version by far. The latter was indeed a major bottleneck inside our implementation, but we were also expecting that of course. What also harmed our runtime was the extremely slow normalization function which was called inside leq for instance. A function that is used quite often... There was also this idea of using dynamic matrices from Batteries. However, we eventually decided against it since most functions actually need an array copying one way or another. For instance removing a variable and resizing the input array could potentially break something in the relationAnalysis as it expects the input to remain unchanged (side effects...). One of the few functions where it could make sense is the join operation as it involves multiple recursion steps that reshape remove rows quite often. We have inspected time stats, however and saw that usually the join operation does not harm the performance that much.
Might make sense to also add a regression test for such things, such that we'll notice if we inadvertently break it again in the future!
56b4125b384ef9a89f249f65e643ba036153a236
I've manually reduced one the smallest SV-Comp test where the mistake occured. The test fails one commit before the fix. After the fix commit it works perfectly fine of course.
Why does this PR update
g2html? Were there any necessary changes, or is this by mistake?
I've reverted the updates to all submodules. These were indeed by mistake 54907736c4d93fd9e4e35d2870839d394ef0f30a and a514d846864520867bdf596b29bca574b5a881fe
Could you merge master into this one last time? Then, I think I'll start another run on sv-comp and then we should be good to go!
I have not been able to do a complete run of Octagons before and after because benchexec froze, but there are some interesting issues: After this change, there are
- 126 new instances of
[EXCEPTION (Cilfacade.TypeOfError)] - 3 new
[EXCEPTION (Apron.Manager.Error)] - 4 new
[EXCEPTION (Goblint_lib.IntDomain.IncompatibleIKinds)] - 6 new
[EXCEPTION (Invalid_argument)] - 2 new unsoundness cases
I started a new run in the hopes that that will not freeze. We should however still take a look at these new issues.
After the fixes, most issues are fixed. comparison.zip
I think we should probably merge this to avoid to merge again and again, and also to make sure we don't do the merge shortly before the sv-comp deadline which may be risky. What do you think @sim642?
Indeed, we shouldn't make such significant changes right before SV-COMP. Based on a quick look at the related thesis, it seems like in SV-COMP this doesn't give us an advantage over octagons both in time and score, right?
Since this involves quite big changes (that are hard to follow on diffs alone), I'd want to take a good look at this again, because it has been a while. Instead of rushing this now, it seems slightly more reasonable to me to have this merged right after SV-COMP. I'm then willing to take care of any merge conflicts myself.
@sim642: For reference, this is Martin's thesis in case you want to look something up (e.g. performance numbers): https://versioncontrolseidl.in.tum.de/teaching/goblint-related-ba-or-ma-theses/-/blob/master/MA_2022_Implementing-and-Comparing-Linear-Relational-Domains_Martin-Wehking.pdf
@michael-schwarz I think I've now done all the tinkering on this that I intended, so you can also have a final look.
Just a reminder here at the end again: this is to be squash merged.
I guess we'll also need to make changes in https://github.com/goblint/bench now that this is merged.