analyzer
analyzer copied to clipboard
Enhance detection of buffer overflows by introducing variable for length of array/blob
A possible direction that we could go in to convince people that our analysis of globals is useful would be to target the absence of buffer overflows (our if we want to go for something more sophisticated, the absence of buffer overflows in string manipulation).
A critical ability there would be to relate the length of a buffer with variables used to access things inside the buffer. Currently, this only works for the non-relational case.
int readUntil(char arr[], int len) {
for(int i=0;i < len;i++) {
char s = arr[i];
}
}
int main() {
int len;
int top;
if(top) {
len = 5;
} else {
len = 10;
}
// char* ptr = malloc(sizeof(char)*len);
char ptr[len];
readUntil(ptr, len);
}
With --enable ana.arrayoob, we warn at the access in readUntil because we do not know that the length of arr is len here (and also, for some reason, when we call readUntil in main(?)). Also, we currently do not warn at all for access of buffers created by malloc, which is why I used the VLA here.
TODOs:
- [ ] Make out of bounds check apply equally for dynamically allocated memory
- [ ] Track symbolic relationships between the length of an array and programs variables (just using the relationship between
lenand the expression used to access the array would not suffice as that could be reassigned later)- [ ] Ghost variable
len$array$that is set tolenon malloc and then automatically tracks the length
- [ ] Ghost variable
- [ ] Figure out how (if) this would work for globals (which would be the case we actually care about)
- Ghost variable
len$array$that is set tolenon malloc and then automatically tracks the length
How do you imagine this working? That there's a pre-transformation of the AST to introduce these such that a relational analysis could track that?
How do you imagine this working? That there's a pre-transformation of the AST to introduce these such that a relational analysis could track that?
Maybe. The nicer thing would be to do it without transformations to the source code at the call to malloc and at the VarDecl for a VLA. That way, one would have distinct ghost variables for things that our heap analysis creates distinct things for, which may be helpful...
But I need to think it through to see if there are some further issues with this idea.
A student of @jerhard is working on a version of this.
By the way, this is one of the selling points for Mopsa. For example, see their SV-COMP 2023 tool paper or Monat's PhD thesis. Although their solution is significantly more elegant because their analyzer design is specifically made to allow such things in an extremely modular way.