analyzer icon indicating copy to clipboard operation
analyzer copied to clipboard

should not `eval` unread globals

Open vogler opened this issue 4 years ago • 10 comments

Why do we eval and thereby add_infl earlyglobs variables which are written but not read? In the example, g2 is read, so the eval is fine, but afterwards in the trace it also evaluates g1 which adds an unwanted dependency. Where does this come from? sync/mutex stuff? Can it be removed? This might cause a lot of unnecessary recomputation.

// ./goblint -v add_infl.c --enable exp.earlyglobs --enable ana.int.interval --html --trace sol2
int g1, g2;

int main() {
  g1 = g2 + 1;
}
%%% sol2: solve node 1 "g1 = g2 + 1;" on add_infl.c:4, called: false, stable: true
%%% sol2: init node 1 "g1 = g2 + 1;" on add_infl.c:4
%%% sol2: add_infl node 1 "g1 = g2 + 1;" on add_infl.c:4 node 2 "return (0);" on add_infl.c:5
%%% sol2: eval node 2 "return (0);" on add_infl.c:5 ## g2 on add_infl.c:1
%%% sol2: simple_solve g2 on add_infl.c:1 (rhs: false)
%%% sol2: init g2 on add_infl.c:1
%%% sol2: add_infl g2 on add_infl.c:1 node 2 "return (0);" on add_infl.c:5
%%% sol2: eval node 2 "return (0);" on add_infl.c:5 ## g1 on add_infl.c:1
%%% sol2: simple_solve g1 on add_infl.c:1 (rhs: false)
%%% sol2: init g1 on add_infl.c:1
%%% sol2: add_infl g1 on add_infl.c:1 node 2 "return (0);" on add_infl.c:5

The example above is extracted from this which has the same issue (also with --sets exp.privatization none):

// ./goblint -v side_cycle.c --enable exp.earlyglobs --enable allfuns --enable ana.int.interval --trace sol2
#include <pthread.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

int g1, g2;

void f() {
  pthread_mutex_lock(&mutex);
  g2 = g1 + 1;
  pthread_mutex_unlock(&mutex);
}

int main() {
  pthread_mutex_lock(&mutex);
  g1 = g2 + 1;
  pthread_mutex_unlock(&mutex);
}

vogler avatar Jun 16 '21 14:06 vogler

I immediately remembered this piece of code: https://github.com/goblint/analyzer/blob/0f9f068ee53e181405515191ad8b9ffe66800fc9/src/analyses/base.ml#L1031-L1034

It's reading the global before writing to it. My understanding is that this is necessary for writing into structs/unions/arrays, such that update_offset can have a structure in which to make the modification at a given offset and return the result to be written.

I suppose for some types this reading isn't really necessary, but that probably requires some refactoring. Maybe make the read_global lazy and give it to update_offset, which can do the read if it needs that for a struct or not do it. To an extent it seems to be matching the read abstract value to determine what it can do and what would be some weird type conflict.

sim642 avatar Jun 16 '21 14:06 sim642

Mh, do we need read_global at all? Couldn't we just update_offset for a bot of that global's type since the side-effects/contributions are joined?

vogler avatar Jun 16 '21 19:06 vogler

I'm not too familiar with the logic there so I don't know, but this thought popped into my mind as well. I guess one must carefully check that they decompose (distribute?) nicely like that.

It's probably this way just because the local updates, which are actually destructive, happen through a similar read-update-write pattern (with more bells and whistles).

sim642 avatar Jun 17 '21 05:06 sim642

Couldn't we just update_offset for a bot of that global's type since the side-effects/contributions are joined?

I think this works well for things of integral types. On the other hand, for structs and arrays there are complications:

Caused by privatization

Consider, e.g., If I have a global struct g with members a and b that are both int, that is protected by m.

lock(m)
g.b = 5; // g gets privatized here. If we do update_offset with bot we get {a -> bot, b -> 5}
int x = g.a; // g.a is read from the privatized value (this should not be bot!)
unlock(m)

Issues with more involved domains

E.g. the partitioned arrays need to actually read the abstract value, even just to do an update to one element (if one has any hope of being precise). Also, we have a student who is working on some minimal relational domain for structs. There, an update would probably also need to consider the entire abstract value.

If we end up introducing it, we should do in a way that does not make this update_offset and set even more complicated imo. But if we find a clean way for that (e.g. by only doing it for integral types), I'm all for it!

michael-schwarz avatar Jun 17 '21 07:06 michael-schwarz

Yes, if it's local, you can't do it. I meant only for side-effects to globals. That's what the snippet above is about, right? What's the issue with e.g. partitioned arrays? Would writes to addresses containing an index then also be a problem? The easy fix would be to write plain integers without read first and keep the read for composite types. However, it would be more elegant if it worked with just updating bot in any case.

vogler avatar Jun 17 '21 07:06 vogler

That's what the snippet above is about, right?

This snippet calls Priv.write_global which then decides based on the activated privatization, if a global should be side-effected (and if yes to which global) and/or added to the local state (as it will be side-effected later, e.g. when a protecting mutex is lost).

michael-schwarz avatar Jun 17 '21 08:06 michael-schwarz

Ok, that makes it more complicated with all the different implementations for it 🙈 Could we still check in base if it's going to be side-effected by asking if it's not private? If we know that, we can try the update bot thing or - if it's making problems for composites - just do it for integers. Lazy read also sounds good, but idk how complicated that would be.

vogler avatar Jun 17 '21 08:06 vogler

What's the issue with e.g. partitioned arrays?

Hmm, I tried to think it through, but I guess the combination of them being zero-initialized and that expressions containing globals can never be used to partition an array means that there are actually no cases where it is bad for precision, as precision of partitioned global arrays (in multithreaded mode) is pretty bad anyway (unless they are partitioned according to a constant, in which case this write without consulting the value first does not actually harm precision).

michael-schwarz avatar Jun 17 '21 08:06 michael-schwarz

Could we still check in base if it's going to be side-effected by asking if it's not private?

The complication is that it is very often side-effected and also added to the local state (e.g. protection based reading any write will immediately be published to unknown g' but still added to the local state if it is protected).[1] So one would have to check if things will only be side-effected and not also added to the local state. I don't think that check is there yet, but should probably also not be so hard to implement. However, this will not hold very often (e.g. only for unprotected globals in the default setting).

If we know that we can try the update bot thing or (...) just do it for integers.

For integers we could do it regardless of all the other considerations. So maybe that is a good first step?


[1]: Some of these variants are explained here: https://versioncontrolseidl.in.tum.de/seidl/traces

michael-schwarz avatar Jun 17 '21 08:06 michael-schwarz

We discussed this again at the GobCon, posting here to recap: The issue is still the way this would interact with privatization.

However, we have now reached the consensus that we could implement it if:

  • we only do it if earlyglobs is on
  • we make earlylgobals incompatible with any of the privatizations (downside is that earlyglobs is somewhat sensible even combined with privatization given that globals are only written from a multi-threaded context)

Alternatively, we could only do it if privatization is none and implement it in the none privatization.

michael-schwarz avatar Feb 02 '22 08:02 michael-schwarz