Dat3M
Dat3M copied to clipboard
Wmm Restructuring
This PR is a massive restructuring of the package com.dat3m.dartagnan.wmm
. It contains several interface changes. Be careful!
Summary
The following changes
- ~~
PropertyEncoder
is now responsible of 'LastCoConstraints', since it manages both current reasons of its existence.~~ - ~~
RelationRepository
is merged intoWmm
.~~ - ~~
R^*
is replaced with(R|id)^+
.~~ - ~~
id
is replaced with[V]
.~~ - ~~
Constraint
unifies bothAxiom
andDefinition
(formerlyRelation
)~~ - ~~The unused constant
Relation.PostFixApprox
becomes the optionencoding.wmm.postFixApproximate
.~~ - Adds (yet unused) capability of n-ary unions and intersections.
-
Relation.encodeTupleSet
is moved into an analysis-like manager. -
Relation.maxTupleSet
etc. are moved intoRelationAnalysis
. - Every relation in the model becomes redefinable.
- The
Relation
class hierarchy is flattened such that all current implementations directly extend the common base. - That common base is now called
Definition
, which was extracted fromRelation
. Those two classes now refer to each other. -
RecursiveRelation
andRecursiveGroup
become obsolete and are removed.
Edit: The items 1 to 5 are already merged into development
. Point 6 was discarded.
I'm not really a fan of the default naming for relations. I think I would prefer unnamed relations instead. They would only be identified by their definition.
For example, the output of wmm.toString
is quite unreadable as compared to before due to the default naming.
Also, we want to distinguish between named and unnamed relations when flattening sequences of binary unions into n-ary unions. While explicitly checking for the default naming works, it feels awkward to do so.
I did try to implement a pass to do this flattening, but the current framework misses redefinitions as well as the capability to delete relations (and axioms).
I think we should try to get the functionality of Wmm
to a level where the implementation of flattening is possible and easy, to make sure that the rework is actually worthwhile.
Another issue I see is that the active set (encodeSet) computation is now completely hidden inside the WmmEncoder
/EncodeSets
.
How would one go about computing partial encode sets for lazy encodings?
My only guess would be to create alternative "lazy" versions of WmmEncoder
and EncodeSets
to get the desired behaviour.
How would one go about computing partial encode sets for lazy encodings?
It should be no problem to add methods like below to WmmEncoder
. Keeping track of alreadyEncoded
would be the responsibility of RefinementSolver
.
Map<Relation,Set<Tuple>> getEncodeSetClosure(
Map<Relation,Set<Tuple>> reasons,
Map<Relation,Set<Tuple>> alreadyEncoded);
BooleanFormula encodeTuples(Map<Relation,Set<Tuple>> tuples);
Those methods would be helpful. I agree that alreadyEncoded
needs to be tracked elsewhere (e.g., in the RefinementSolver
).
I would probably name the second method like this:
BooleanFormula encodeRelations(Map<Relation, Set<Tuple>> tuplesToBeEncoded);
Then the current (default) implementation can call this with the appropriate map that contains everything computed by the active set analysis (btw. we should eventually rename encode sets to active sets).
For the first one, I'm not sure why you called the first parameter reasons
. I assume that map expresses what the user wants to encode, i.e., the parameters would be toBeEncoded
and alreadyEncoded
instead. The default implementation would pass as first argument a map filled with the encode sets provided by the axioms (+ those needed for rf and co) and an empty map for alreadyEncoded
.
Then the full eager encoding would be achieved by
Map<Relation,Set<Tuple>> toBeEncoded = ... // add axioms etc.
Map<Relation,Set<Tuple>> alreadyEncoded = Map.of();
BooleanFormula enc = encodeTuples/Relations(getEncodeSetClosure(toBeEncoded, alreadyEncoded));
right? I would like this design quite a bit more than the current one. Those methods give nice flexibility to the caller and are less state-dependent.
Relations that are defined via let r = ...
are unnamed now.
The parser adds an alias but the relation itself remains unnamed.
This exposes a bug in my default implementation of Definition#substitute(Relation,Relation)
. I will have to add implementations to all definitions. ~~Since the method makes no sense to be public at all, I will implement a visitor .~~
I suggest to clean up the WmmEncoder
class a little bit.
I think we cannot have methods named like encodeSAT
, encodeIDL
, and encodeEdgeSeq
within such a monolithic class. The methods are specifically made for coherence and read-from and should be named appropriately.
Also, the interleaving of helper methods and encoding methods in the program text is not nice.
I would prefer it we the code was systematically grouped with some comment lines to visualize the grouping.
There are multiple ways to do so. I think it is best to keep logically related code closely to each other so that, e.g., visitMemoryOrder
is followed by encodeIDL/SAT
.
Also, I'm not too sure about the naming convention. Why do we call the coherence order by memory order now? Wouldn't it be better to stay close to the standard names?
Regarding the order, I followed a field-constructor-method-first, accessibility-second approach. I would like to inline those methods, since they are basically named blocks (one usage) and currently have about 20 similar lines of code.
As for the naming Memory Order for co
, I found it easier to explain its semantics to others by referring to it this way. Also Coherence was a synonym for the SC-per-Location axiom in some literature that I consulted back then.
Regarding the order, I followed a field-constructor-method-first, accessibility-second approach. I would like to inline those methods, since they are basically named blocks (one usage) and currently have about 20 similar lines of code.
That order is fine on the macro-scopic level. But the ordering among the methods can still be smarter. Also, inlining just because there is a single usage is not necessarily good. Code sharing always comes with coupling, so if you inline those functions to share code then whenever you want to adapt the SAT-encoding in the future, you will always have to think about the IDL encoding as well (and vice versa) due to their common code. In other words: certain degrees of code duplication are actually good! It would be a different case if the common code of both methods needs to be consistent, i.e., updating one but not the other would break something. This is IMO not the case here. Anyway, you can merge the functions if you want and if needed we will just separate them again.
As for the naming Memory Order for
co
, I found it easier to explain its semantics to others by referring to it this way. Also Coherence was a synonym for the SC-per-Location axiom in some literature that I consulted back then.
The name "memory order" is fine by itself, but it is unnecessarily inconsistent if we use "coherence" elsewhere.
We should consistently stick to one name in our code base and just add a comment somewhere that says "this is also sometimes called X in the literature".
People are not consistent with their naming in the literature, but that doesn't mean we shouldn't be either.
As far as I can tell, the name co
seems to be more prevalent considering that most cat files use co
rather than mo
, so I would likely stay with "coherence" but that is not a "strict" preference.
In #330, code distance in WmmEncoder
was optimized and MemoryOrder
was renamed to Coherence
.
Does this mean you do all the changes on the other branch instead of here? If so, then we can close this branch and directly review the other one.
Yes. The point is, that the missing changes would need to be done a second time when merging development
into xra
.
I don't quite understand that. If we merge the changes into development
and then merge development
into xra
, then the changes should end up just fine in xra
, no? Do you expect merge conflicts that require you to manually merge the changes?
Now they should. Sorry, got fixated on my strategy.