codeql icon indicating copy to clipboard operation
codeql copied to clipboard

C++ Function Call to Undefined Function

Open bdrodes opened this issue 3 years ago • 4 comments

On lgtm.com I ran the following query c++ on openssl/openssl:

import cpp

from Call c
where not c.getTarget().hasDefinition() and c.getTarget().hasGlobalName("BUF_MEM_free") 
select c, "TEST"

It produces several instances of calls to BUF_MEM_free where codeql thinks the definition does not exist. It only finds the stub in buffer.h. If I generate a similar query for calls where a definition is known, it also finds many instances, pointing to the definition in buffer.c.

I'm trying to understand why there is a discrepancy in finding function definitions. I'm guessing there is probably some dynamic loading going on, but I wanted to verify there isn't some deeper bug here. Also, I was wondering if there are solutions to finding the possible definition other than matching functions with definitions to the fully qualified function name at the call.

bdrodes avatar Jul 11 '22 16:07 bdrodes

I'm trying to understand why there is a discrepancy in finding function definitions. I'm guessing there is probably some dynamic loading going on

This is indeed most likely due to dynamic loading. However, the behaviour is a bit odd, as it differs from the behaviour of similar kinds of queries you could write for Classes. In the latter case all the declarations are "folded" into the unique definition when such a unique definition exists (even when there's dynamic loading).

For functions I would like to move to similar behaviour as for classes: if given some function we can identify a single unique definition we "fold" all the declarations into that definition, if we cannot find such a definition we do nothing (so you'll see what you see now). The latter can happen when there's no definition, or when there are multiple implementations (across different binaries). This is of course an approximation of what is going on. We cannot statically analyse the dynamic loading behaviour.

Also, I was wondering if there are solutions to finding the possible definition other than matching functions with definitions to the fully qualified function name at the call.

Not at the moment. I do wonder if we should provide something like that.

jketema avatar Jul 12 '22 07:07 jketema

@jketema Thanks. I was thinking there should be some kind of folding. I was going to simulate it myself but it is currently easier just to ignore these situations explicitly. If there is an opened issue for this to be folded at some point, I'd like to follow it. Thanks.

bdrodes avatar Jul 12 '22 15:07 bdrodes

If there is an opened issue for this to be folded at some point, I'd like to follow it.

That's an internal issue, so unfortunately I cannot share it. However, I did link this issue from there, so we don't forget to report back here once the folding is in place.

jketema avatar Jul 12 '22 19:07 jketema

I've been stumbling on this problem again on other code bases where targets either are in .h files. I've seen this for ordinary functions and for virtual functions. I'm forced to modify path traces and other logic to fold possible definitions in. Is there any movement on this issue in general?

bdrodes avatar Sep 29 '22 14:09 bdrodes

Apologies for the slow response. I looked at this briefly, but ran into - what seemed to be - some quite fundamental problems. I'll try to have another look soon.

jketema avatar Oct 07 '22 12:10 jketema

@jketema this problem basically bites me on every query and resolution predicates I've made are seemingly expensive at scale. Is there any update on the problem or fix and a potential eta or efficient work arounds? Thanks.

bdrodes avatar Oct 19 '22 18:10 bdrodes

this problem basically bites me on every query

Understood.

Is there any update on the problem or fix and a potential eta or

Not really. It depends a bit on how difficult it is to solve the more fundamental issue I ran into while looking at this.

jketema avatar Oct 20 '22 12:10 jketema

For functions I would like to move to similar behaviour as for classes: if given some function we can identify a single unique definition we "fold" all the declarations into that definition, if we cannot find such a definition we do nothing (so you'll see what you see now). The latter can happen when there's no definition, or when there are multiple implementations (across different binaries). This is of course an approximation of what is going on. We cannot statically analyse the dynamic loading behaviour.

We merged a PR that implements this behaviour. This should be available in CodeQL 2.16.0.

jketema avatar Dec 22 '23 12:12 jketema

We unfortunately had to revert the change due to performance problems. Re-opening this issue.

jketema avatar Jan 12 '24 10:01 jketema

@jketema , any idea on an ETA of a change? I.e., should we expect regressions in the near future on our end or do you think we will have a patch in before the next tag release/

bdrodes avatar Jan 12 '24 15:01 bdrodes

Can't say anything about the ETA. There shouldn't be any regressions if you use released versions. This was too late for the last 2.15.x release, and now just won't make it in the 2.16.0.

jketema avatar Jan 12 '24 15:01 jketema

We've merged a new version of the change https://github.com/github/codeql/pull/15421, which should make it into 2.16.2. Note that the change will only have an effect on databases created with 2.16.2 and later.

jketema avatar Feb 02 '24 09:02 jketema