analyzer
analyzer copied to clipboard
`compinfo` keys not updated incrementally
UpdateCil
updates IDs of statements (sid
) and variables (vid
), but not compinfo
keys (ckey
). I already wondered this in #598 but now here's concrete evidence that not updating them causes issues for incremental updating as well.
Reproduction
Analyze the following program incrementally (before vs after patching): https://github.com/goblint/bench/tree/b79d0997d2bf3444f6296301e0bfc7bdebc8788c/cve-bench/CVE-2019-19537. In order to see the problem, https://github.com/goblint/analyzer/commit/5cfb7e69fe9a4f0ceff1735580200e0e80d41213 needs to be reverted, because it simply hides the problematic accesses. Nevertheless, similar structs and unions could be used in actual user code, which the access analysis shouldn't ignore.
Before
On initial analysis, there's the following memory location with four accesses:
[Warning][Race] Memory location (struct usb_interface).dev(119532124,usb_interface).mutex(1002340388,device).wait_lock(655119850,mutex).__annonCompField18(448810155,spinlock).rlock(344401721,__anonunion____missing_field_name_256509364).raw_lock(706607406,raw_spinlock).__annonCompField17(637443086,arch_spinlock).tickets(201271018,__anonunion____missing_field_name_45265226).head(313990837,__raw_tickets)(119532124) (race with conf. 80):
read with mhp:{tid=[], {usb_register_dev, usb_devnode}} (conf. 80) (file.c:68:2-68:32)
read with mhp:{tid=[], {usb_register_dev}} (conf. 80) (file.c:200:2-200:114)
write with mhp:{tid=[], {usb_register_dev, usb_devnode}} (conf. 80) (file.c:68:2-68:32)
write with mhp:{tid=[], {usb_register_dev}} (conf. 80) (file.c:200:2-200:114)
Here the offset fields have debug printout of the ckey
and cname
of the compinfo
they belong to.
After
After patching, the result differs when analyzed in server mode vs without it.
In server mode
Surprisingly, the number of safe and total memory locations goes up, because the previous four accesses now are split between two slightly different memory locations:
[Success][Race] Memory location (struct usb_interface).dev(119532124,usb_interface).mutex(1002340388,device).wait_lock(655119850,mutex).__annonCompField18(448810155,spinlock).rlock(344401721,__anonunion____missing_field_name_256509364).raw_lock(706607406,raw_spinlock).__annonCompField17(637443086,arch_spinlock).tickets(757927913,__anonunion____missing_field_name_45265226).head(313990837,__raw_tickets)(119532124) (safe):
read with [mhp:{tid=[], {usb_register_dev}}, lock:{minor_rwsem}] (conf. 80) (file.c:201:2-201:114)
write with [mhp:{tid=[], {usb_register_dev}}, lock:{minor_rwsem}] (conf. 80) (file.c:201:2-201:114)
[Warning][Race] Memory location (struct usb_interface).dev(119532124,usb_interface).mutex(1002340388,device).wait_lock(655119850,mutex).__annonCompField18(448810155,spinlock).rlock(344401721,__anonunion____missing_field_name_256509364).raw_lock(706607406,raw_spinlock).__annonCompField17(637443086,arch_spinlock).tickets(201271018,__anonunion____missing_field_name_45265226).head(313990837,__raw_tickets)(119532124) (race with conf. 80):
read with mhp:{tid=[], {usb_register_dev, usb_devnode}} (conf. 80) (file.c:68:2-68:32)
write with mhp:{tid=[], {usb_register_dev, usb_devnode}} (conf. 80) (file.c:68:2-68:32)
The difference in the locations is that the second-to-last field offset tickets
belongs to different structs: for reused accesses it's with key 201271018
and for new accesses it's 757927913
. This ckey
difference in one offset component is enough to consider them completely separate memory locations.
This result is unsound because the old racing accesses also race with the new accesses that have the extra lock.
Without server mode
Outside of server mode, all four accesses remain together under one memory location, as expected.
Explanation
The problem is that compinfo
keys are assigned from nextCompinfoKey
in CIL. In server mode, both before and after programs are parsed during a single Goblint execution, therefore the counter doesn't reset in between. Without server mode, the before and after programs are parsed during separate Goblint execution, thus both start with the same counter value.
Even outside of server mode it only works by coincidence that the number and order of defined structs/unions does not change, such that they get identical keys. I haven't tested this but I suspect if that were to change, this could also cause issues outside of server mode.
So clearly UpdateCil
also needs to fix up compinfo
keys.
Additional notes
The problematic struct-union-emalgamation in Linux kernel headers is this:
typedef struct arch_spinlock {
union {
__ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
} arch_spinlock_t;
In CIL:
- The keys are assigned incrementally here: https://github.com/goblint/cil/blob/da5e1941db81643ef0d2133960cd28a7de311131/src/cil.ml#L1498-L1499.
- This is also documented in the interface: https://github.com/goblint/cil/blob/da5e1941db81643ef0d2133960cd28a7de311131/src/cil.mli#L365-L369.
- Contradictorily, the
ckey
field documentation in the implementation says they are based on a name hash: https://github.com/goblint/cil/blob/da5e1941db81643ef0d2133960cd28a7de311131/src/cil.ml#L353-L357. - Turns out that's what
Mergecil
does later: https://github.com/goblint/cil/blob/da5e1941db81643ef0d2133960cd28a7de311131/src/mergecil.ml#L1510.
This hashing explains why all the ckey
s in the above output are so large. But it also poses issues for the incremental updating, because unlike sid
s and vid
s they aren't assigned sequentially (at least in the merged files that get compared).
And it also creates further confusion, because in both cases the corresponding cname
is identical: __anonunion____missing_field_name_45265226
. Nevertheless, somehow they end up with different keys. I'm not sure why, but might be because the anonymous union with a missing field name initially gets a different name and during merging the names are somehow unified but the keys are not?