Dat3M icon indicating copy to clipboard operation
Dat3M copied to clipboard

Wmm Restructuring

Open xeren opened this issue 2 years ago • 6 comments

This PR is a massive restructuring of the package com.dat3m.dartagnan.wmm. It contains several interface changes. Be careful!

Summary

The following changes

  • ~~PropertyEncoder is now responsible of 'LastCoConstraints', since it manages both current reasons of its existence.~~
  • ~~RelationRepository is merged into Wmm.~~
  • ~~R^* is replaced with (R|id)^+.~~
  • ~~id is replaced with [V].~~
  • ~~Constraint unifies both Axiom and Definition (formerly Relation)~~
  • ~~The unused constant Relation.PostFixApprox becomes the option encoding.wmm.postFixApproximate.~~
  • Adds (yet unused) capability of n-ary unions and intersections.
  • Relation.encodeTupleSet is moved into an analysis-like manager.
  • Relation.maxTupleSet etc. are moved into RelationAnalysis.
  • Every relation in the model becomes redefinable.
  • The Relation class hierarchy is flattened such that all current implementations directly extend the common base.
  • That common base is now called Definition, which was extracted from Relation. Those two classes now refer to each other.
  • RecursiveRelation and RecursiveGroup become obsolete and are removed.

Edit: The items 1 to 5 are already merged into development. Point 6 was discarded.

xeren avatar Sep 28 '22 14:09 xeren

I'm not really a fan of the default naming for relations. I think I would prefer unnamed relations instead. They would only be identified by their definition. For example, the output of wmm.toString is quite unreadable as compared to before due to the default naming. Also, we want to distinguish between named and unnamed relations when flattening sequences of binary unions into n-ary unions. While explicitly checking for the default naming works, it feels awkward to do so.

I did try to implement a pass to do this flattening, but the current framework misses redefinitions as well as the capability to delete relations (and axioms). I think we should try to get the functionality of Wmm to a level where the implementation of flattening is possible and easy, to make sure that the rework is actually worthwhile.

ThomasHaas avatar Oct 15 '22 08:10 ThomasHaas

Another issue I see is that the active set (encodeSet) computation is now completely hidden inside the WmmEncoder/EncodeSets. How would one go about computing partial encode sets for lazy encodings? My only guess would be to create alternative "lazy" versions of WmmEncoder and EncodeSets to get the desired behaviour.

ThomasHaas avatar Oct 15 '22 14:10 ThomasHaas

How would one go about computing partial encode sets for lazy encodings?

It should be no problem to add methods like below to WmmEncoder. Keeping track of alreadyEncoded would be the responsibility of RefinementSolver.

Map<Relation,Set<Tuple>> getEncodeSetClosure(
        Map<Relation,Set<Tuple>> reasons,
        Map<Relation,Set<Tuple>> alreadyEncoded);

BooleanFormula encodeTuples(Map<Relation,Set<Tuple>> tuples);

xeren avatar Oct 15 '22 16:10 xeren

Those methods would be helpful. I agree that alreadyEncoded needs to be tracked elsewhere (e.g., in the RefinementSolver).

I would probably name the second method like this:

BooleanFormula encodeRelations(Map<Relation, Set<Tuple>> tuplesToBeEncoded);

Then the current (default) implementation can call this with the appropriate map that contains everything computed by the active set analysis (btw. we should eventually rename encode sets to active sets).

For the first one, I'm not sure why you called the first parameter reasons. I assume that map expresses what the user wants to encode, i.e., the parameters would be toBeEncoded and alreadyEncoded instead. The default implementation would pass as first argument a map filled with the encode sets provided by the axioms (+ those needed for rf and co) and an empty map for alreadyEncoded. Then the full eager encoding would be achieved by

Map<Relation,Set<Tuple>> toBeEncoded = ... // add axioms etc.
Map<Relation,Set<Tuple>> alreadyEncoded = Map.of();
BooleanFormula enc = encodeTuples/Relations(getEncodeSetClosure(toBeEncoded, alreadyEncoded));

right? I would like this design quite a bit more than the current one. Those methods give nice flexibility to the caller and are less state-dependent.

ThomasHaas avatar Oct 15 '22 17:10 ThomasHaas

Relations that are defined via let r = ... are unnamed now. The parser adds an alias but the relation itself remains unnamed.

ThomasHaas avatar Oct 16 '22 10:10 ThomasHaas

This exposes a bug in my default implementation of Definition#substitute(Relation,Relation). I will have to add implementations to all definitions. ~~Since the method makes no sense to be public at all, I will implement a visitor .~~

xeren avatar Oct 17 '22 16:10 xeren

I suggest to clean up the WmmEncoder class a little bit. I think we cannot have methods named like encodeSAT, encodeIDL, and encodeEdgeSeq within such a monolithic class. The methods are specifically made for coherence and read-from and should be named appropriately. Also, the interleaving of helper methods and encoding methods in the program text is not nice. I would prefer it we the code was systematically grouped with some comment lines to visualize the grouping. There are multiple ways to do so. I think it is best to keep logically related code closely to each other so that, e.g., visitMemoryOrder is followed by encodeIDL/SAT.

Also, I'm not too sure about the naming convention. Why do we call the coherence order by memory order now? Wouldn't it be better to stay close to the standard names?

ThomasHaas avatar Nov 17 '22 10:11 ThomasHaas

Regarding the order, I followed a field-constructor-method-first, accessibility-second approach. I would like to inline those methods, since they are basically named blocks (one usage) and currently have about 20 similar lines of code.

As for the naming Memory Order for co, I found it easier to explain its semantics to others by referring to it this way. Also Coherence was a synonym for the SC-per-Location axiom in some literature that I consulted back then.

xeren avatar Nov 17 '22 11:11 xeren

Regarding the order, I followed a field-constructor-method-first, accessibility-second approach. I would like to inline those methods, since they are basically named blocks (one usage) and currently have about 20 similar lines of code.

That order is fine on the macro-scopic level. But the ordering among the methods can still be smarter. Also, inlining just because there is a single usage is not necessarily good. Code sharing always comes with coupling, so if you inline those functions to share code then whenever you want to adapt the SAT-encoding in the future, you will always have to think about the IDL encoding as well (and vice versa) due to their common code. In other words: certain degrees of code duplication are actually good! It would be a different case if the common code of both methods needs to be consistent, i.e., updating one but not the other would break something. This is IMO not the case here. Anyway, you can merge the functions if you want and if needed we will just separate them again.

As for the naming Memory Order for co, I found it easier to explain its semantics to others by referring to it this way. Also Coherence was a synonym for the SC-per-Location axiom in some literature that I consulted back then.

The name "memory order" is fine by itself, but it is unnecessarily inconsistent if we use "coherence" elsewhere. We should consistently stick to one name in our code base and just add a comment somewhere that says "this is also sometimes called X in the literature". People are not consistent with their naming in the literature, but that doesn't mean we shouldn't be either. As far as I can tell, the name co seems to be more prevalent considering that most cat files use co rather than mo, so I would likely stay with "coherence" but that is not a "strict" preference.

ThomasHaas avatar Nov 17 '22 12:11 ThomasHaas

In #330, code distance in WmmEncoder was optimized and MemoryOrder was renamed to Coherence.

xeren avatar Nov 23 '22 13:11 xeren

Does this mean you do all the changes on the other branch instead of here? If so, then we can close this branch and directly review the other one.

ThomasHaas avatar Nov 23 '22 22:11 ThomasHaas

Yes. The point is, that the missing changes would need to be done a second time when merging development into xra.

xeren avatar Nov 24 '22 15:11 xeren

I don't quite understand that. If we merge the changes into development and then merge development into xra, then the changes should end up just fine in xra, no? Do you expect merge conflicts that require you to manually merge the changes?

ThomasHaas avatar Nov 24 '22 15:11 ThomasHaas

Now they should. Sorry, got fixated on my strategy.

xeren avatar Nov 24 '22 16:11 xeren