C++ Function Call to Undefined Function
On lgtm.com I ran the following query c++ on openssl/openssl:
import cpp
from Call c
where not c.getTarget().hasDefinition() and c.getTarget().hasGlobalName("BUF_MEM_free")
select c, "TEST"
It produces several instances of calls to BUF_MEM_free where codeql thinks the definition does not exist. It only finds the stub in buffer.h. If I generate a similar query for calls where a definition is known, it also finds many instances, pointing to the definition in buffer.c.
I'm trying to understand why there is a discrepancy in finding function definitions. I'm guessing there is probably some dynamic loading going on, but I wanted to verify there isn't some deeper bug here. Also, I was wondering if there are solutions to finding the possible definition other than matching functions with definitions to the fully qualified function name at the call.
I'm trying to understand why there is a discrepancy in finding function definitions. I'm guessing there is probably some dynamic loading going on
This is indeed most likely due to dynamic loading. However, the behaviour is a bit odd, as it differs from the behaviour of similar kinds of queries you could write for Classes. In the latter case all the declarations are "folded" into the unique definition when such a unique definition exists (even when there's dynamic loading).
For functions I would like to move to similar behaviour as for classes: if given some function we can identify a single unique definition we "fold" all the declarations into that definition, if we cannot find such a definition we do nothing (so you'll see what you see now). The latter can happen when there's no definition, or when there are multiple implementations (across different binaries). This is of course an approximation of what is going on. We cannot statically analyse the dynamic loading behaviour.
Also, I was wondering if there are solutions to finding the possible definition other than matching functions with definitions to the fully qualified function name at the call.
Not at the moment. I do wonder if we should provide something like that.
@jketema Thanks. I was thinking there should be some kind of folding. I was going to simulate it myself but it is currently easier just to ignore these situations explicitly. If there is an opened issue for this to be folded at some point, I'd like to follow it. Thanks.
If there is an opened issue for this to be folded at some point, I'd like to follow it.
That's an internal issue, so unfortunately I cannot share it. However, I did link this issue from there, so we don't forget to report back here once the folding is in place.
I've been stumbling on this problem again on other code bases where targets either are in .h files. I've seen this for ordinary functions and for virtual functions. I'm forced to modify path traces and other logic to fold possible definitions in. Is there any movement on this issue in general?
Apologies for the slow response. I looked at this briefly, but ran into - what seemed to be - some quite fundamental problems. I'll try to have another look soon.
@jketema this problem basically bites me on every query and resolution predicates I've made are seemingly expensive at scale. Is there any update on the problem or fix and a potential eta or efficient work arounds? Thanks.
this problem basically bites me on every query
Understood.
Is there any update on the problem or fix and a potential eta or
Not really. It depends a bit on how difficult it is to solve the more fundamental issue I ran into while looking at this.
For functions I would like to move to similar behaviour as for classes: if given some function we can identify a single unique definition we "fold" all the declarations into that definition, if we cannot find such a definition we do nothing (so you'll see what you see now). The latter can happen when there's no definition, or when there are multiple implementations (across different binaries). This is of course an approximation of what is going on. We cannot statically analyse the dynamic loading behaviour.
We merged a PR that implements this behaviour. This should be available in CodeQL 2.16.0.
We unfortunately had to revert the change due to performance problems. Re-opening this issue.
@jketema , any idea on an ETA of a change? I.e., should we expect regressions in the near future on our end or do you think we will have a patch in before the next tag release/
Can't say anything about the ETA. There shouldn't be any regressions if you use released versions. This was too late for the last 2.15.x release, and now just won't make it in the 2.16.0.
We've merged a new version of the change https://github.com/github/codeql/pull/15421, which should make it into 2.16.2. Note that the change will only have an effect on databases created with 2.16.2 and later.