analyzer
analyzer copied to clipboard
__LINE__ in macro can cause several functions to be changed
When benchmarking zstd I discovered that for some commits the runtime of the incremental analysis with reluctant destabilization was noticeable worse than for the basic incremental run. For these commits also several functions were detected as being changed even though only a very little number of relevant lines of code had changed. The reason for this seems to be that
- the changed file contains a macro using
__LINE__ - an added/removed line somewhere in the file entails that at all following occurrences of the macro, a change is detected because the line number is different
- this leads to a bunch of changed functions in the respective file for which new ids are generated
- if one of them contains a
malloc, a different heap variable is created - we suspect that because of that, even with reluctant destabilization, the changed function containing the
mallocwill need to be fully destabilized
A commit where this problem can for example be observed is https://github.com/facebook/zstd/commit/a8adfa7f67840dec0fd94f25eecfb4abda208f23. It has 2 relevant changed lines of code, 17 changed functions, took ~260s to analyze incrementally and ~390s with reluctant destabilization (the non-incremental analysis of the parent commit took ~1220s).
--set pre.cppflags[+] -D__LINE__=0 works (and similarly for others, e.g. __FILE__) and just makes the preprocessor whine a bit. Not sure if there's any better solution.
EDIT: We could also supply a custom header that forces corresponding #defines, but it should probably be disabled by default.
The standard might not like that though:
Attempts to redefine or undefine these macros result in undefined behavior.
(https://en.cppreference.com/w/c/preprocessor/replace)
Redefining __LINE__ might work, but it is a bit risky. It may, e.g. (as Sarah pointed out to me yesterday), lead to miscompilation for Macros that contain labels and use __LINE__ to make sure the label in each expansion of the macro has a unique name.
With these really huge codebases, such things do happen and while this will hopefully cause CIL to just reject the file, I'm a bit worried CIL might just continue with a warning, which will make us unsound. (Not even talking about the obvious unsoundness if you have sth like int i = __LINE__;
We were thinking about some less brutal options, such as silently patching constant int arguments to functions that do not have a body, and for which we do not use the special transfer function (we should be able to decide this ahead of our analysis based on our list of library functions). This would of course not cover all instances, but at least the one we were seeing in the benchmark, and maybe that is good enough for now?
We were thinking about some less brutal options, such as silently patching constant int arguments to functions that do not have a body, and for which we do not use the
specialtransfer function (we should be able to decide this ahead of our analysis based on our list of library functions). This would of course not cover all instances, but at least the one we were seeing in the benchmark, and maybe that is good enough for now?
I don't really understand how this is related because __LINE__ gets replaced with integer constants by the preprocessor, before CIL and let alone Goblint sees anything. At that point those constants are indistinguishable from literal constants in the original source.
In zstd's case, it might be easiest to just disable DEBUG, so these __LINE__ constants should get totally preprocessed away: https://github.com/facebook/zstd/blob/698f261b35eb256921d95a36ce5af0c29e4df45e/zlibWrapper/examples/zwrapbench.c#L88-L102.
At that point those constants are indistinguishable from literal constants in the original source.
Yes, but for functions for which we do not have a body (and which are not special, i.e.malloc and friends), we don't do anything with constant int arguments, so we can just silently patch it (to the new value) and treat the code as unchanged, right?
Kind of off-topic, but the code in https://github.com/facebook/zstd/commit/a8adfa7f67840dec0fd94f25eecfb4abda208f23 looks suspiciously weird:
if (fileSize == UTIL_FILESIZE_UNKNOWN)
EXM_THROW(32, "This file format is not supported : Dictionary file %s\n", fileName);
{
// ...
}
That block isn't even attached to the if. Maybe it's intentional, but that's Apple's SSL exploit-level misleading style for sure.
Yes, but for functions for which we do not have a body (and which are not
special, i.e.mallocand friends), we don't do anything with constant int arguments, so we can just silently patch it (to the new value) and treat the code as unchanged, right?
So if I'm getting this right, what you have in mind is very specific to zstd's case, where __LINE__ is used directly as the argument of fprintf, which no analysis currently handles?
I'd rather just preprocess to disable those incrementally-fragile macros of a particular project (which has zero unsoundness risk!) than bake lots of assumptions into Goblint:
__LINE__is used as a direct argument of an undefined function (likefprintf).- No Goblint analysis (including in the future) cares about the integer constants of
specialcall arguments. Besidesmallocand friends, it's not unimaginable that one day we might have a better out of bounds analysis, which covers a few standard memory management functions or things likesnprintfthat also rely on integer constants in their arguments. - The function call is direct, not via a pointer, which we cannot syntactically detect.
- The project doesn't use a custom defined output function with a line argument. Any wrapper function would completely defeat this hack.
- Output based on
__LINE__isn't constructed via the preprocessor itself. For example (classical preprocessing technique):#define STR_HELPER(x) #x #define STR(x) STR_HELPER(x) char* i = "line:" STR(__LINE__); // string literal "line:3"
I mean when one describes it like that, I think it sounds worse then it is :wink: