Dat3M
Dat3M copied to clipboard
Allowing dynamically sized mallocs
Right now, Dartagnan overapproximates the set of all possible addresses but to do so, it uses two assumptions:
- There is a fixed and known number of mallocs
- Each malloc has fixed and known size
Why do we do this? Well, if we have a fixed set of addresses, we can compute for each memory event the set of addresses it could potentially affect. We use this information to obtain alias information. However, the encoding of memory accesses doesn't care at all. We can use arbitrary address expressions for each memory event that evaluate to any arbitrary integer without regarding the address space at all. In other words, we are not tied to a fixed memory space in the encoding but rather in the current alias analysis.
Let's assume that we drop the second assumption, that is, we still have a fixed set of mallocs but each of them is dynamically sized. For each malloc k, we can express the malloced memory region as an interval [base_k, base_k + size_k]
. Now let us create symbolic addresses for each of the bases and let size_k
be given by any integer expression
We can now encode 0 < base_1, base_1 + size_1 < base_2, base_2 + size_2 < base_3 ...
This will guarantee that no matter to what value any of the size expressions evaluate, we have non-overlapping memory.
In the absence of UB, ordering the bases in this manner is also sound.
What shall the alias analysis do now? Well, we need to properly track bases. If we have two operations that access different bases, then they can never alias (assuming no undefined behaviour). If they access the same base, we need to compare the offsets to determine whether they can alias or not. If we cannot find any set of possible bases for some memory access, we can still assume the whole address space. If we could establish that all pointers in the program point to some base (i.e. to some malloced object), we could even improve our alias analysis (if we load a ptr and don't know its value, a consequent load from that ptr-addr has to be a base
so we can still say that accesses with different offsets cannot alias, even if we do not know their bases). This is actually quite frequently the case, unless we have pointers to inside a struct/array.
So if I'm not mistaken, then the only reason we do not have dynamically sized memory is because of our alias analysis. And I believe our alias analyses could be adapted to argue about base
+ offset
(I think it does it already to some extent).
I thought a little bit more about this and found a problem with the handling of init events
. An unbounded amount of addresses would need an unbounded amount of init writes
, which is problematic. One would need to introduce a symbolic init event
that abstracts all init events simultanously. While it is easy to do this for the data flow part of the encoding, it is not at all clear how one would do this for the wmm part of the encoding. In particular, the symbolic init event
would need to have coherence edges to all writes (since it simultaneously accesses all addresses) which causes problems like fr = rf^-1;co
being able to relate events to distinct addresses (when going over the init event
). To fix this, one would need to have fr = rf^-1;co & loc
instead.
Furthermore, we would need to change the fact that init writes
all belong to distinct threads.
An alternative would be to assume well-initialized programs such that init reads should not be possible (hence we don't need to encode them at all). This is not a nice solution at all though. Maybe there are further alternatives if we somehow introduce quantifiers to instantiate init events on demand (we want to avoid quantifiers though)
@hernanponcedeleon and me talked about possible solutions before. The easiest we've come up with works as follows.
- We create init events per read event so that each read is guaranteed to have an associated init. Furthermore, as programs are bounded, this will automatically bound the number of init events we need.
- If we have must-alias information, we can merge multiple init events to a single one.
- Since we now have situations where different init events actually belong to the same address (e.g. if their associated reads end up aliasing), we must handle this situation. We need to make sure that these multiple init events are not distiguishable and appear as if there was only a single init.
- To do so, we have to make sure that init events to the same address have identical
co
andrf
-edges. The former should be obtained for free by assigning clock variable 0 to all init events (as we already do). The latter needs to be solved by creating multiplerf
-edges from each init event to each read that actually reads an init value (i.e., an init-read will read from all same-address init events simultanously) - Doing this, all same-address init events should be indistiguishable as they show the same behavior under each relation with the exception of
id
(but if necessary, we could adaptid
as well).
I think I came up with an idea that might solve this problem (and hopefully also #132 and #198). It will also have an impact in #137.
As I see it, addresses are needed for
(1) creating Init
events here, and
(2) generating information used to decide alias between MemEvent
s here.
As stated in #137, this way of computing alias information is over-restrictive. For the time being, i.e. until #137 is properly solved, I think we can use the solution below to keep our current alias analysis while not relying on knowing all existent addresses.
The set of all addresses is used because MemEvent.canAddressTheSameLocation
is implemented as the intersection of the MaxAddressSet
s of both events (i.e. the set of all addresses is used as the \top of the lattice in traditional alias analysis). However, anytime we set maxTupleSet
to all addresses in AliasAnalysis
, we can just set it to empty and modify MemEvent.canAddressTheSameLocation
as follows
return !Sets.intersection(e1.getMaxAddressSet(), e2.getMaxAddressSet()).isEmpty()
|| e1.getMaxAddressSet().isEmpty() || e2.getMaxAddressSet().isEmpty();
With this change, we get rid of the need to know all addresses for (2) and we just need to deal with (1).
Instead of having one Init
event for each address, we can have a single event (thus keeping the constraint that each read must have a single write). However, we need to treat the new Init
differently in the encoding of RelRf
.
For Store
s, we keep the encoding as it is (both the addresses and the values should match).
For this new Init
, instead of initialising
memValueExpr = value.toIntFormula(ctx);
we use an UF init_value: Integer -> Integer
in the right hand side (from addresses to values). For those cases where we know the initial value of the address (i.e., coming from ProgramBuilder.initLocEqConst()
), we set an axiom init_value(address) = value
. We further add an axiom saying that for any other address the function returns zero (to keep our current semantics).
Coming back to the encoding of RelRf
, instead of saying that the addresses and the values should match, we say that
r.getMemValueExpr() = init_value(r.getMemAddressExpr())
I think that misses the problem. Unbounded memory is no problem for the SMT encoding. We have plenty of options there.
The problem is that we need actual events because we need edge-variables between those events. With a single init node, you are bound to run into problems when e.g. evaluating rf-1;rf
which would end up relating all reads that read any init value.
Put differently, the size of the execution graph is unbounded if the there is an unbounded number of addresses that need to get initialized. And I don't think what you propose helps in this regard.
You are right (I should have read the 2nd msg in the issue). However, I think that the solution in the 3rd msg combined with (2) should also removed the need to explicitly allocate every address.
Regarding the solution form the 3rd msg ... I think instead of creating one Init
event per Load
, it should be sufficient to create one Init
event per IExpr
used in any Load.getAddress()
if the expression does not contain registers (for the remaining cases we create several Init
s). The implementation of Register.equals()
marks as equal two objects if their name
and threadId
match. However since registers can be overwritten, this will would cause problems if we just create one Init
for e2
and e4
below
e1: r1 <- 1;
e2: r2 = Load(r1);
e3: r1 <- 2;
e4: r2 = Load(r1);
The problem that we have, is that whenever there is an array or structure access, boogie usually generates two Local
s (one for loading the base address and another to add the offset) and only then the MemEvent
using the last register:
r1 <- &mem0;
r2 <- r1 + 4;
r3 = load(r2);
I think that once we have the ProgramProcessing
interface fixed, we should create a processing pass that converts the program above to:
r1 <- &mem0;
r2 <- &mem0 + 4;
r3 = load(&mem0 + 4);
This is just constant propagation, but if we do this after unrolling, a single iteration should be enough. In general this should gives us precise information about all registers except those coming from a Load
. For the Clock
benchmarks we should have fully precise information. For M&S
or SafeStack
we will not.
We could later have another pass the removes unused code (here the first two instructions).
Creating an init event per Load.getAddress
is basically the same as creating an init event per Load
and merging init events based on alias information. The latter needs less special treatment and is even more general in that it can merge init events that have different expressions as long as they alias.
Your proposed solution will almost always only merge init events that have a constant IExpr
but those should hopefully be found to must-alias anyways. But if expression comparison can give us more accuracy, then maybe this is something that should be done in the alias analysis:
- For each thread, go through all the loads and stores and compare their address expression pairwise
- If expressions match, we have must-alias information (care has to be taken for expressions containing registers. We need to check if the register changes between the two memory accesses).
I want to give an update on this since we recently removed the requirement of init events in #624, which were the main problem in supporting dynamically sized mallocs. With those out of the way, let me sketch what needs to be done to add the mallocs.
(1) The address
field in MemoryObject
needs to go. The MemoryObject
itself already represents its address, so the field is questionable to begin with. Instead, EncodingContext.address(MemoryObject)
should return an SMT-variable that represents the address of the memory object.
(2) The size
field in MemoryObject
needs to become a proper Expression
. Since this size expression can refer to registers, we need to remember the allocation site (i.e., the event that allocates the memory object) so that we know where to evaluate the expression. This likely means that we do not want to rewrite Alloc
events to Local
events as we do right now. The alias analysis will need to get updated to handle Alloc
events (should be an easy change).
(3) The ProgramEncoder
should generate an encoding that makes sure that each memory object is well-aligned and that no two memory objects overlap. The encoding could look something like this:
// Suppose the memory objects are ordered and indexed by i=1 to i=n
forall i: (memObj_i % alignment = 0 && memObj_i + size_i <= memObj_{i+1})
This encoding can be optimized in two ways. (1) if we order all statically-sized memory objects to come first, we can give them constant addresses (i.e. encode memObj_i = constantAddr_i
) just like we do now and skip the encoding of their constraints. Secondly, since alignment is usually a power of two, memObj_i % alignment
is equivalent to some extract(memObj_i, numAlignmentBits) = 0
(the SMT solver might already do this transformation on its own).
Aspects that need get considered:
- What happens if
size_i
is negative orsize_i
is so large that we overflow (assuming 64-bit address range)? - What happens if the
Alloc
does not get executed in the first place? What are the constraints onmemObj_i
andsize_i
then? Do we need to guard the encoding with something likeexec(Alloc) => ...
?