gccrs Example of -fanalyzer with gccrs

For fun, I tried running gccrs with -fanalyzer ; it seems to work; here's an example of detecting a double-free in unsafe code:

https://godbolt.org/z/3PrKTP8bs

// This doesn't seem to work yet, so let's use i8 instead...
// use std::ffi::c_void;

extern "C" {
    //fn free(p: *const c_void);
    fn free(p: *const i8);
}

fn call_free (s: *const i8) {
    unsafe {
        free (s);
    }
}

pub fn test(flag: bool, s: *const i8) {
    call_free (s);
    if (flag) {
        call_free (s);
    }
}

for which I get this output in Compiler Explorer:

<source>:8:1: warning: function is never used: 'call_free'
    8 | fn call_free (s: *const i8) {
      | ^
<source>:14:5: warning: function is never used: 'test'
   14 | pub fn test(flag: bool, s: *const i8) {
      |     ^
<source>: In function 'example::call_free':
<source>:10:9: warning: double-'free' of 's_2(D)' [CWE-415] [-Wanalyzer-double-free]
   10 |         free (s);
      |         ^
  'example::test': events 1-2
    |
    |   14 | pub fn test(flag: bool, s: *const i8) {
    |      |     ^
    |      |     |
    |      |     (1) entry to 'example::test'
    |   15 |     call_free (s);
    |      |     ~
    |      |     |
    |      |     (2) calling 'example::call_free' from 'example::test'
    |
    +--> 'example::call_free': events 3-4
           |
           |    8 | fn call_free (s: *const i8) {
           |      | ^
           |      | |
           |      | (3) entry to 'example::call_free'
           |    9 |     unsafe {
           |   10 |         free (s);
           |      |         ~
           |      |         |
           |      |         (4) first 'free' here
           |
    <------+
    |
  'example::test': events 5-8
    |
    |   15 |     call_free (s);
    |      |     ^
    |      |     |
    |      |     (5) returning to 'example::test' from 'example::call_free'
    |   16 |     if (flag) {
    |      |     ~
    |      |     |
    |      |     (6) following 'true' branch (when 'flag_5(D) != 0')...
    |   17 |         call_free (s);
    |      |         ~
    |      |         |
    |      |         (7) ...to here
    |      |         (8) passing freed pointer 's_3(D)' in call to 'example::call_free' from 'example::test'
    |
    +--> 'example::call_free': events 9-10
           |
           |    8 | fn call_free (s: *const i8) {
           |      | ^
           |      | |
           |      | (9) entry to 'example::call_free'
           |    9 |     unsafe {
           |   10 |         free (s);
           |      |         ~
           |      |         |
           |      |         (10) second 'free' here; first 'free' was at (4)
           |
ASM generation compiler returned: 0

Not sure if this is at all useful, given that both gccrs and -fanalyzer are experimental, but it was fun, and is hopefully of interest to gccrs developers.

Jul 20 '22 21:07 davidmalcolm

FWIW it also compiles without the unsafe around the free; presumably that's a known bug?

Jul 20 '22 22:07 davidmalcolm

Oh wow, this is really cool. Thank you for checking that out @davidmalcolm. Regarding the missing unsafe part, yeah we haven't had the time to implement proper unsafe checks, but they're coming!

Jul 21 '22 07:07 CohenArthur

Cool! I can definitively see this complement miri in the future! (In case you didn't know, miri is an interpreter for rust which does a lot of UB checks. As dynamic analysis it can only check UB that is actually hit and not what could happen given specific inputs. It is also rather slow.) Would it be possible in the future to add rust specific UB rules for eg stacked borrows (a proposal for a memory model) to the analyzer or does the infrastructure of the analyzer not allow language specific rules?

Jul 21 '22 08:07 bjorn3

Nice! Generic GCC abstractions for the win -- "it just works"! :-)

Unless anybody gets there first, I shall turn this into a GCC/Rust test case.

Jul 21 '22 09:07 tschwinge

Cool! I can definitively see this complement miri in the future! (In case you didn't know, miri is an interpreter for rust which does a lot of UB checks. As dynamic analysis it can only check UB that is actually hit and not what could happen given specific inputs. It is also rather slow.)

-fanalyzer itself can be slow.

Would it be possible in the future to add rust specific UB rules for eg stacked borrows (a proposal for a memory model) to the analyzer or does the infrastructure of the analyzer not allow language specific rules?

Architecturally, -fanalyzer runs as an interprocedural pass inside GCC's middle-end, operating on the gimple-SSA representation (which is rather late compared to most analysis tools; in particular some optimizations have already run). I chose this point in order to piggy-back off of LTO support, so that I can do cross-TU analysis (although any real-world use of LTO analysis tends to explode in complexity due to bugs in my stack management and call-summarization code). I might move it earlier at some point, but it's likely to always be on gimple.

So it "knows" about gimple, but it probably has a bunch of C assumptions.

What would rust-specific UB rules look like at the gimple level? If they're expressible in terms of gimple, then the answer to if they're possible is "yes, perhaps" (but with the caveat that I already have far too much on my plate :) )

Jul 21 '22 19:07 davidmalcolm

Unless anybody gets there first, I shall turn this into a GCC/Rust test case.

Please go for it!

Jul 21 '22 19:07 davidmalcolm

Cool! I can definitively see this complement miri in the future! (In case you didn't know, miri is an interpreter for rust which does a lot of UB checks. As dynamic analysis it can only check UB that is actually hit and not what could happen given specific inputs. It is also rather slow.)

-fanalyzer itself can be slow.

A common estimate of the overhead miri has over running a compiled executable is 1000x when having stacked borrows checks enabled (as is the default), so miri is also rather slow.

What would rust-specific UB rules look like at the gimple level? If they're expressible in terms of gimple, then the answer to if they're possible is "yes, perhaps"

Makes sense. Rust is missing several kinds of UB that C has. For example overflows for signed integers either panic or wrap around. It is never UB. (It is fine to have the analyzer report an error on any integer overflow, be it signed or unsigned) It also doesn't have typed memory, so writing a value through a pointer with one type and loading it again using another type is fine (assuming that the raw bytes written are valid values for the read type. for example no uninitialized bytes for integers or a value other than 0 or 1 for bools) On the other hand the memory model has much stricter aliasing rules that kind of imitate the borrow checker at runtime. For example you can't write anything through an &u8, only through an &mut u8 or *const u8/*mut u8 that was not derived from an &u8. And if you write through an &mut u8, that may invalidate other references to the same place. There are no definitive rules, but stacked borrows is the most promising proposal at the moment. @RalfJung (who has authored this proposal and implemented it in miri) has written a paper about it and a couple of blog posts. https://www.ralfj.de/blog/2019/11/18/stacked-borrows-paper.html references them.

(but with the caveat that I already have far too much on my plate :) )

Of course. I was just curious if it was possible in the first place. I don't think it makes sense to implement the entirety of stacked borrows checking. At least not yet. As for the UB rust is missing, part may already be handled given correct lowering to gimple as necessary to prevent miscompilations. I don't know if -fanalyzer depends on typed memory though.

Jul 21 '22 19:07 bjorn3