Dat3M
Dat3M copied to clipboard
Performance of alias analysis
Our alias analysis is very slow when run on this program.
These are the times I get (running this requires GlobalSettings.ARCH_PRECISION = 64
)
[22.01.2023] 16:43:34 [INFO] AliasAnalysis.fromConfig - Selected Alias Analysis: FIELD_SENSITIVE
[22.01.2023] 16:45:37 [INFO] AliasAnalysis.fromConfig - Done
The C-code of the benchmark contains bit-fiddling on pointers, which our alias analysis cannot handle. Here is an example:
static inline struct task_struct *rt_mutex_owner(struct rt_mutex_base *lock)
{
unsigned long owner = (unsigned long) READ_ONCE(lock->owner);
return (struct task_struct *) (owner & ~RT_MUTEX_HAS_WAITERS); // Bit-fiddling on pointer
}
Our alias analysis will assume that loads/stores involving the returned pointer can address anything, due to the bit-fiddling on pointers.
I understand this will have an effect on the precision of the analysis, but is this also related to why running the analysis is so slow?
Let me correct what I said. While the bit-fiddling is problematic, the fact that we load a pointer from memory should already give us a pointer without any aliasing information, so the bit-fiddling doesn't make it worse, I guess.
I don't know how exactly the performance correlates to the precision, since it might also get faster due not being able to derive anything. However, this code is interesting for the alias analysis in general.
This benchmark has around 1200 memory addresses and I ran out of Java heap space while profiling, but I got some data from that.
All time of the algorithm (up to the crash after around 3 minutes) is spent in the methods fields
(71% of time) and addAllAddresses
(29%).
Both functions are easily improvable, and from simple rewriting I got it like 6 times faster already.

The last issue is just that we store a fresh HashSet
per pointer in the program.
Many pointers have the same potential set of addresses they can access, so they could share the same HashSet
-instance for that. I think there are many ways to go about this, like e.g. canonicalizing every points-to-set or finding equivalence classes of pointers in a pre-analysis so that they can be "merged".
Edit: PR #374 addresses the performance loss in fields
. The remaining bottleneck is addAllAddresses
which is affected by the above mentioned problem.
@ThomasHaas you mentioned that you discussed (before the improvement that you already implemented) with Rene some potential improvements. Do those differ of what was already implemented? I.e. can we improve performance even further or should we close this issue?
What I discussed with René is orthogonal to what I improved so far. Overall there are more optimizable points, but I don't mind closing this issue if you think it is good enough for now.
Right now, it is less blocking than it used to be, but I would not complain if we can get the times faster. Since we already have ideas on how to improve this, let's keep this issue open and we get those done.
@ThomasHaas can you add to this issue the llvm file of tree-RCU so we have a concrete test showing the problem?
Here is a reduced version of the example: rcu-test-alias.ll.zip. The original takes over an hour and I hope that this one is a bit faster (but still slow). I'm also wondering if the analysis problem is related to poison addresses used in linked lists. Those are clearly problematic, but I'm not sure if the relevant code is ever executed.
Edit: The alias analysis on the reduced version seems to take around 30 minutes on my machine (bound=1
).
The alias analysis did not finish overnight for benchmarks/folds/was.c
.
@xeren do you have any idea why the performance is so bad in this example?
On what branch do you have that code?
Sorry (spell checker). This code https://github.com/hernanponcedeleon/Dat3M/blob/master/benchmarks/lfds/wsq.c
BTW ... both the field sensitive/insensitive analysis finish below 1 sec and seems to produce good results, e.g.
[08.09.2024] 17:14:41 [INFO] AliasAnalysis.fromConfig - Selected alias analysis: FIELD_SENSITIVE
[08.09.2024] 17:14:41 [INFO] AliasAnalysis.fromConfig - Finished alias analysis in 0.53 secs
[08.09.2024] 17:14:41 [INFO] RelationAnalysis.fromConfig - Selected relation analysis: NATIVE
[08.09.2024] 17:14:41 [INFO] RelationAnalysis.fromConfig -
wmm.analysis.relationAnalysis: NATIVE
wmm.analysis.extendedRelationAnalysis: true
[08.09.2024] 17:14:47 [INFO] RelationAnalysis.fromConfig - Finished regular analysis in 6.150 secs
[08.09.2024] 17:14:49 [INFO] RelationAnalysis.fromConfig - Finished extended analysis in 1.205 secs
[08.09.2024] 17:14:49 [INFO] RelationAnalysis.fromConfig -
======== RelationAnalysis summary ========
#Relations: 109
#Axioms: 9
#may-edges removed (extended): 67545
#must-edges added (extended): 0
total #must|may|exclusive edges: 1647667|2416852|0
#must|may rf edges: 2|3710
#must|may co edges: 2396|4728
===========================================
[08.09.2024] 17:16:37 [INFO] AliasAnalysis.fromConfig - Selected alias analysis: FIELD_INSENSITIVE
[08.09.2024] 17:16:37 [INFO] AliasAnalysis.fromConfig - Finished alias analysis in 0.24 secs
[08.09.2024] 17:16:37 [INFO] RelationAnalysis.fromConfig - Selected relation analysis: NATIVE
[08.09.2024] 17:16:37 [INFO] RelationAnalysis.fromConfig -
wmm.analysis.relationAnalysis: NATIVE
wmm.analysis.extendedRelationAnalysis: true
[08.09.2024] 17:16:43 [INFO] RelationAnalysis.fromConfig - Finished regular analysis in 5.965 secs
[08.09.2024] 17:16:44 [INFO] RelationAnalysis.fromConfig - Finished extended analysis in 1.622 secs
[08.09.2024] 17:16:44 [INFO] RelationAnalysis.fromConfig -
======== RelationAnalysis summary ========
#Relations: 109
#Axioms: 9
#may-edges removed (extended): 87917
#must-edges added (extended): 0
total #must|may|exclusive edges: 1647667|2522918|0
#must|may rf edges: 2|4952
#must|may co edges: 2396|6084
===========================================
Due to the recent update to mem2reg, we get a lot of warnings regarding pontentially uninitialized variables.
My best bet is that the alias analysis has to be conservative on those (assuming non-det initial value) which causes it to generate lots of aliasing. Notice that newtasks
is an array of pointers, so a non-det value allows for aliasing with every object.
~~We need to incorporate reasoning about path conditions to prove that the values are initialized.~~ (Won't work in this example).
EDIT: Actually, in this example proving that we do not access uninitialized registers seems to be very hard. In fact, I don't understand how this code is correct.
if (count >= q.mask) {
// == (count >= size-1)
int newsize = (q.mask == 0 ? q.InitialSize : 2 * (q.mask + 1));
assert(newsize < q.MaxSize);
Obj *newtasks[STATICSIZE]; // This gets mem2reg'd
int i;
for (i = 0; i < count; i++) {
int temp = (h + i) & q.mask;
newtasks[i] = q.elems[temp];
}
for (i = 0; i < newsize; i++) {
q.elems[i] = newtasks[i]; // This is only initialized by the above loop if "newsize <= count".
}
Consider the entry condition count >= q.mask
with a count
that is only slightly larger than the mask, and q.mask != 0
. Then int newsize = (q.mask == 0 ? q.InitialSize : 2 * (q.mask + 1));
easily leads to a value that is larger than count
.
This in turn really causes uninitialized values to get accessed. Isn't this UB?.
I agree it is not trivial to know if the code has UB (even with the patch below). However, the patch below fixes all the warnings and the alias analysis still seems to have problems
diff --git a/benchmarks/lfds/wsq.c b/benchmarks/lfds/wsq.c
index 2831dac3a..142e05ad0 100644
--- a/benchmarks/lfds/wsq.c
+++ b/benchmarks/lfds/wsq.c
@@ -14,7 +14,7 @@
#define STEAL_ATTEMPS 1
void *stealer(void *unused) {
- Obj *r;
+ Obj *r = NULL;
for (int i = 0; i < STEAL_ATTEMPS; i++) {
if (steal(&r)) {
operation(r);
@@ -41,14 +41,14 @@ int main(void) {
for (int i = 0; i < ITEMS / 2; i++) {
push(&items[2 * i]);
push(&items[2 * i + 1]);
- Obj *r;
+ Obj *r = NULL;
if (pop(&r)) {
operation(r);
}
}
for (int i = 0; i < ITEMS / 2; i++) {
- Obj *r;
+ Obj *r = NULL;
if (pop(&r)) {
operation(r);
}
diff --git a/benchmarks/lfds/wsq.h b/benchmarks/lfds/wsq.h
index c992bf794..0e620b419 100644
--- a/benchmarks/lfds/wsq.h
+++ b/benchmarks/lfds/wsq.h
@@ -209,6 +209,9 @@ void syncPush(Obj* elem) {
int temp = (h + i) & q.mask;
newtasks[i] = q.elems[temp];
}
+ for (i = count; i < STATICSIZE - count; i++) {
+ newtasks[i] = NULL;
+ }
for (i = 0; i < newsize; i++) {
q.elems[i] = newtasks[i];
}
I haven't checked Dartagnan on your patch, but I would be surprised if the patch really changed much.
Since the initializing loop has dynamic bounds (cause count
is dynamic), I don't think that Dartagnan is able to figure out that newtasks
is really initialized (this requires far more intricate reasoning).
The alias analysis did not finish overnight for
benchmarks/folds/was.c
.@xeren do you have any idea why the performance is so bad in this example?
This program seems to contain negative weight cycles. This seems to put IBPA into an endless loop. The best fix appears to be setting TOP = List.of(-1)
(what it actually should be in theory). Then it takes 0.5 secs on my machine. I did not define it like that in the first place, because it lost precision in other programs, compared to the older analyses.
I did not define it like that in the first place, because it lost precision in other programs, compared to the older analyses.
Does this means that "in theory" it is not the case that the default algorithm is always better than the others?
IIRC, the old algorithms were just unsound in those cases and that is why they were "more precise".
Exactly. In terms of precision, "in theory" cannot really apply here. In practice, the fixed default is sometimes less precise.
Version 1 (FIELD_INSENSITIVE
) truncates out-of-bounds addresses every time, in order to avoid looping on non-zero-weight cycles.
Version 2 (FIELD_SENSITIVE
) basically kept this issue, but also introduced a provenance assumption. Then there was a loss in precision in some litmus tests featuring pointer sets starting with the last field of an object (e.g. &A+12+4x
on a 16-bytes object A
). There was only one possible location, so this generated more MUST-alias relationships. To enforce more precision everywhere, I made the assumption, that x
is non-negative.
In Version 3 (FULL
), I replaced the out-of-bounds filter with a different handling of positive-weight cycles. But I kept the assumption from before, which lead to this issue.