Dat3M Rework ExecutionModel

This is as a collection of requirements for the rework of the ExecutionModel class and its usages (specially generation of violation witnesses in *.graphviz format).

Currently, the ExecutionModel is a wrapper around an SMT model representing a (not necessarily consistent) execution with an API that allows to get information in a workable manner. We need a more general class that represents executions no matter where this execution is coming from (i.e., its "source"). Some possible sources are SMT model, svcomp violation witness, violation coming from other tools like genmc.

Internally, the class should contain at least

the executed events
the reads-from relation (rf)
the coherence relation (co)

It should also be able to give information about other base relations like po, ext, ctrl, ... One could explicitly store those relations (similar to rf and co) or extract them using information from the Event class, e.g. we can check if two events belong to the same thread with Event.getThread() and compare ids with Event.getGlobalId(). This gives us all info we need to decide the program order po. For dependency relations (data, addr, ctrl), it might be easier to save them explicitly (e.g., this is something the SMT solver will give us, but maybe not other sources)

It should also be able to compute derive relations. Right now, if one uses --method=eager, derived relations can also be queried to the SMT solver model (same as for base relations). However, if one uses --method=lazy, the SMT solver only knows about base relations. The WMMSolver will construct a ExecutionGraph from the ExecutionModel which populates the graph representation of derived relations.

The new ExecutionModel should definitely be able to populate itself from an SMT model, but the design should be flexible enough to eventually rewrite the code in /home/ponce/git/Dat3M/dartagnan/src/main/java/com/dat3m/dartagnan/witness/graphml to use the ExecutionModel. The default --method is eager, so the solution should definitely be able to populate the derived relations from the SMT model information of base ones.

Right now, ExecutionModel only keep track of memory events (loads and stores). The new class should also keep track of other events (at least Local since those help to debug low level data flow). Control-flow events like CondJump and Label can probably be skipped.

There are at least three usages of the ExecutionModel

WMMSolver: each iteration of the verification will create an execution (in most of the cases inconsistent wrt the memory model) and send this to the theory solver to check for consistency. This use case only requires capturing executed events and base relations.
Graphviz violation visualizer: once a consistent execution violating a property was found, we need to write this to a file for the user. The current code implementing this is in /home/ponce/git/Dat3M/dartagnan/src/main/java/com/dat3m/dartagnan/witness/grapviz. We need to allow at least two extra options: show derived relations (given by the user as an option --witness.doshow=a,b,c), and show local events (in many cases this will blow up the size of the graph so the option should be off by default)
SVCOMP violation witness: this use case assumes the existence of a hb relation in the cat model (file /cat/svcomp.cat satisfies this). We then create a linearization of the order and convert it to *.graphml format. The current code doing this (but not using ExecutionModel) is in /home/ponce/git/Dat3M/dartagnan/src/main/java/com/dat3m/dartagnan/witness/graphml

Sep 11 '24 12:09 hernanponcedeleon

@CapZTr as we discussed,

@ThomasHaas @natgavrilenko @JonasOberhauser if you have further comments or requirements, feel free to comment

Sep 11 '24 12:09 hernanponcedeleon

I think the following is true:

There is something like a EventModel (the current name is EventData but that is not a good name), that describes all the data relevant to the executions of an event (e.g. the value it observed and/or produced). Maybe it makes sense to have subclasses thereof to properly capture different types of events (e.g. memory, fence, local, branch, assert, ...).
Similarly, there should be a RelationModel to capture the relation interpretations in the execution graph. It is sufficient, for most cases, to only capture base relations. Derived relations can always be computed on demand via functionality we already have/need. That being said, the ExecutionModel doesn't need to care where the relations come from or if they are base/derived. In particular, it should not contain the functionality to compute derived relations at all.
The ExecutionModel consists of such EventModels, RelationModels and potentially a bunch of metadata describing the setting in which it was created (could be compilation target, the property it tries to violate (if any), the assumed progress model etc.).

Ideally, such an ExecutionModel can be dumped in possibly different formats and maybe even reparsed. When parsed, you likely will lose information about the original event that is modelled by the EventModels. So if that is a desired feature, ExecutionModels should be able to stand alone without references to a Program or Wmm. On the other hand, for refinement-based solving, the ExecutionModel should reference the original Program/Wmm in order to generate refinement clauses. This can either be achieved by having optional "source information" attached to the model, or adding a new "SourceMapping" that provides this information. This class would then only be needed for Refinement-based solving but otherwise is orthogonal to the ExecutionModel.

Sep 11 '24 19:09 ThomasHaas

An explanation about why we should handle input of the option --witness.show=a,b,c as String:

Options like --property and --method have a fixed set of possible values so that they are able to be handled as EnumSet. They have dedicated classes enumerating their values, defining the String representation of each etc. But for the option for which relations to be shown a such class cannot be implemented. The reason is that it's hard to enumerate all possible values of it. Because theoretically all names of all relations defined in all memory models belong to this set.

Technically it is also possible to handle this option as regular Set instead of EnumSet, but the disadvantage is user has to type --witness.show={a,b,c} in the command. The consistent implementation of the command processing for developers and the consistent format of the command typing for users, we have to keep at least one of them. As Set we lose both but as String we keep the latter.

Dec 06 '24 13:12 CapZTr

Technically, you can define a custom TypeConverter that converts the user-provided string in a type of your choosing during injection time. It might not be possible to overwrite the TypeConverter for collections like Set<String> though. It's likely that one would need ot define a dummy class like StringSet and then give a TypeConverter for that one. It comes also with other hassles like registration of the new TypeConverter, so I don't think its the best solution.

TypeConverter ConfigurationBuilder#converter

Dec 06 '24 13:12 ThomasHaas