Dat3M Allowing dynamically sized mallocs

Right now, Dartagnan overapproximates the set of all possible addresses but to do so, it uses two assumptions:

There is a fixed and known number of mallocs
Each malloc has fixed and known size

Why do we do this? Well, if we have a fixed set of addresses, we can compute for each memory event the set of addresses it could potentially affect. We use this information to obtain alias information. However, the encoding of memory accesses doesn't care at all. We can use arbitrary address expressions for each memory event that evaluate to any arbitrary integer without regarding the address space at all. In other words, we are not tied to a fixed memory space in the encoding but rather in the current alias analysis.

Let's assume that we drop the second assumption, that is, we still have a fixed set of mallocs but each of them is dynamically sized. For each malloc k, we can express the malloced memory region as an interval [base_k, base_k + size_k]. Now let us create symbolic addresses for each of the bases and let size_k be given by any integer expression We can now encode 0 < base_1, base_1 + size_1 < base_2, base_2 + size_2 < base_3 ... This will guarantee that no matter to what value any of the size expressions evaluate, we have non-overlapping memory. In the absence of UB, ordering the bases in this manner is also sound.

What shall the alias analysis do now? Well, we need to properly track bases. If we have two operations that access different bases, then they can never alias (assuming no undefined behaviour). If they access the same base, we need to compare the offsets to determine whether they can alias or not. If we cannot find any set of possible bases for some memory access, we can still assume the whole address space. If we could establish that all pointers in the program point to some base (i.e. to some malloced object), we could even improve our alias analysis (if we load a ptr and don't know its value, a consequent load from that ptr-addr has to be a base so we can still say that accesses with different offsets cannot alias, even if we do not know their bases). This is actually quite frequently the case, unless we have pointers to inside a struct/array.

So if I'm not mistaken, then the only reason we do not have dynamically sized memory is because of our alias analysis. And I believe our alias analyses could be adapted to argue about base + offset (I think it does it already to some extent).

Oct 08 '21 21:10 ThomasHaas

I thought a little bit more about this and found a problem with the handling of init events. An unbounded amount of addresses would need an unbounded amount of init writes, which is problematic. One would need to introduce a symbolic init event that abstracts all init events simultanously. While it is easy to do this for the data flow part of the encoding, it is not at all clear how one would do this for the wmm part of the encoding. In particular, the symbolic init event would need to have coherence edges to all writes (since it simultaneously accesses all addresses) which causes problems like fr = rf^-1;co being able to relate events to distinct addresses (when going over the init event). To fix this, one would need to have fr = rf^-1;co & loc instead. Furthermore, we would need to change the fact that init writes all belong to distinct threads.

An alternative would be to assume well-initialized programs such that init reads should not be possible (hence we don't need to encode them at all). This is not a nice solution at all though. Maybe there are further alternatives if we somehow introduce quantifiers to instantiate init events on demand (we want to avoid quantifiers though)

Oct 17 '21 21:10 ThomasHaas

@hernanponcedeleon and me talked about possible solutions before. The easiest we've come up with works as follows.

We create init events per read event so that each read is guaranteed to have an associated init. Furthermore, as programs are bounded, this will automatically bound the number of init events we need.
If we have must-alias information, we can merge multiple init events to a single one.
Since we now have situations where different init events actually belong to the same address (e.g. if their associated reads end up aliasing), we must handle this situation. We need to make sure that these multiple init events are not distiguishable and appear as if there was only a single init.
To do so, we have to make sure that init events to the same address have identical co and rf-edges. The former should be obtained for free by assigning clock variable 0 to all init events (as we already do). The latter needs to be solved by creating multiple rf-edges from each init event to each read that actually reads an init value (i.e., an init-read will read from all same-address init events simultanously)
Doing this, all same-address init events should be indistiguishable as they show the same behavior under each relation with the exception of id (but if necessary, we could adapt id as well).

Dec 10 '21 15:12 ThomasHaas

I think I came up with an idea that might solve this problem (and hopefully also #132 and #198). It will also have an impact in #137.

As I see it, addresses are needed for (1) creating Init events here, and (2) generating information used to decide alias between MemEvents here.

As stated in #137, this way of computing alias information is over-restrictive. For the time being, i.e. until #137 is properly solved, I think we can use the solution below to keep our current alias analysis while not relying on knowing all existent addresses.

The set of all addresses is used because MemEvent.canAddressTheSameLocation is implemented as the intersection of the MaxAddressSets of both events (i.e. the set of all addresses is used as the \top of the lattice in traditional alias analysis). However, anytime we set maxTupleSet to all addresses in AliasAnalysis, we can just set it to empty and modify MemEvent.canAddressTheSameLocation as follows

        return !Sets.intersection(e1.getMaxAddressSet(), e2.getMaxAddressSet()).isEmpty()
        		|| e1.getMaxAddressSet().isEmpty() || e2.getMaxAddressSet().isEmpty();

With this change, we get rid of the need to know all addresses for (2) and we just need to deal with (1).

Instead of having one Init event for each address, we can have a single event (thus keeping the constraint that each read must have a single write). However, we need to treat the new Init differently in the encoding of RelRf. For Stores, we keep the encoding as it is (both the addresses and the values should match). For this new Init, instead of initialising

memValueExpr = value.toIntFormula(ctx);

we use an UF init_value: Integer -> Integer in the right hand side (from addresses to values). For those cases where we know the initial value of the address (i.e., coming from ProgramBuilder.initLocEqConst()), we set an axiom init_value(address) = value. We further add an axiom saying that for any other address the function returns zero (to keep our current semantics).

Coming back to the encoding of RelRf, instead of saying that the addresses and the values should match, we say that

r.getMemValueExpr() = init_value(r.getMemAddressExpr())

Jan 05 '22 22:01 hernanponcedeleon

I think that misses the problem. Unbounded memory is no problem for the SMT encoding. We have plenty of options there. The problem is that we need actual events because we need edge-variables between those events. With a single init node, you are bound to run into problems when e.g. evaluating rf-1;rf which would end up relating all reads that read any init value. Put differently, the size of the execution graph is unbounded if the there is an unbounded number of addresses that need to get initialized. And I don't think what you propose helps in this regard.

Jan 05 '22 22:01 ThomasHaas

You are right (I should have read the 2nd msg in the issue). However, I think that the solution in the 3rd msg combined with (2) should also removed the need to explicitly allocate every address.

Regarding the solution form the 3rd msg ... I think instead of creating one Init event per Load, it should be sufficient to create one Init event per IExpr used in any Load.getAddress() if the expression does not contain registers (for the remaining cases we create several Inits). The implementation of Register.equals() marks as equal two objects if their name and threadId match. However since registers can be overwritten, this will would cause problems if we just create one Init for e2 and e4 below

e1: r1 <- 1;
e2: r2 = Load(r1);
e3: r1 <- 2;
e4: r2 = Load(r1);

The problem that we have, is that whenever there is an array or structure access, boogie usually generates two Locals (one for loading the base address and another to add the offset) and only then the MemEvent using the last register:

r1 <- &mem0;
r2 <- r1 + 4;
r3 = load(r2);

I think that once we have the ProgramProcessing interface fixed, we should create a processing pass that converts the program above to:

r1 <- &mem0;
r2 <- &mem0 + 4;
r3 = load(&mem0 + 4);

This is just constant propagation, but if we do this after unrolling, a single iteration should be enough. In general this should gives us precise information about all registers except those coming from a Load. For the Clock benchmarks we should have fully precise information. For M&S or SafeStack we will not.

We could later have another pass the removes unused code (here the first two instructions).

Jan 06 '22 10:01 hernanponcedeleon

Creating an init event per Load.getAddress is basically the same as creating an init event per Load and merging init events based on alias information. The latter needs less special treatment and is even more general in that it can merge init events that have different expressions as long as they alias. Your proposed solution will almost always only merge init events that have a constant IExpr but those should hopefully be found to must-alias anyways. But if expression comparison can give us more accuracy, then maybe this is something that should be done in the alias analysis:

For each thread, go through all the loads and stores and compare their address expression pairwise
If expressions match, we have must-alias information (care has to be taken for expressions containing registers. We need to check if the register changes between the two memory accesses).

Jan 06 '22 11:01 ThomasHaas

I want to give an update on this since we recently removed the requirement of init events in #624, which were the main problem in supporting dynamically sized mallocs. With those out of the way, let me sketch what needs to be done to add the mallocs.

(1) The address field in MemoryObject needs to go. The MemoryObject itself already represents its address, so the field is questionable to begin with. Instead, EncodingContext.address(MemoryObject) should return an SMT-variable that represents the address of the memory object.

(2) The size field in MemoryObject needs to become a proper Expression. Since this size expression can refer to registers, we need to remember the allocation site (i.e., the event that allocates the memory object) so that we know where to evaluate the expression. This likely means that we do not want to rewrite Alloc events to Local events as we do right now. The alias analysis will need to get updated to handle Alloc events (should be an easy change).

(3) The ProgramEncoder should generate an encoding that makes sure that each memory object is well-aligned and that no two memory objects overlap. The encoding could look something like this:

// Suppose the memory objects are ordered and indexed by i=1 to i=n
forall i: (memObj_i % alignment = 0 && memObj_i + size_i <= memObj_{i+1})

This encoding can be optimized in two ways. (1) if we order all statically-sized memory objects to come first, we can give them constant addresses (i.e. encode memObj_i = constantAddr_i) just like we do now and skip the encoding of their constraints. Secondly, since alignment is usually a power of two, memObj_i % alignment is equivalent to some extract(memObj_i, numAlignmentBits) = 0 (the SMT solver might already do this transformation on its own).

Aspects that need get considered:

What happens if size_i is negative or size_i is so large that we overflow (assuming 64-bit address range)?
What happens if the Alloc does not get executed in the first place? What are the constraints on memObj_i and size_i then? Do we need to guard the encoding with something like exec(Alloc) => ...?

Mar 14 '24 08:03 ThomasHaas

Dat3M Dat3M copied to clipboard

Allowing dynamically sized mallocs

Dat3M
Dat3M copied to clipboard