codeql icon indicating copy to clipboard operation
codeql copied to clipboard

C++: Recognize allocation functions heuristically

Open MathiasVP opened this issue 3 years ago • 6 comments
trafficstars

This PR does two things:

  • It adds a couple of new interface classes, HeuristicAllocationExpr and HeuristicAllocationFunction, that complement the already existing AllocationExpr and HeuristicAllocation classes with functions that we think may perform allocations.
  • It switched the new cpp/invalid-pointer-deref query to use HeuristicAllocationExpr instead of AllocationExpr as a source of flow.

Currently, the heuristics used for detecting an HeuristicAllocationFunction is:

  1. The function name matches %alloc%
  2. It returns a pointer type
  3. We can find a unique parameter of unsigned integral type

In addition, a HeuristicAllocationFunction is recognized as a realloc-like function if, in addition to the above, it:

  1. Matches the name %realloc%, and
  2. We can find a unique parameter of a pointer type

This finds a bunch of new allocation functions on the databases I've tried. For example, on php we recognize the following set of new allocation functions:

Screenshot 2022-09-29 at 20 04 07

I've manually verified that these are all implementations that we want to recognize as allocation functions.

Commit-by-commit review encouraged.

  • https://github.com/github/codeql/commit/a9710453f4366ebb999b6e383e4cbc71ec364d51 adds the new classes by wrapping a parameterized module around the logic for generating an AllocationExpr from a call to an AllocationFunction.
  • https://github.com/github/codeql/commit/d12a76559a44708274446cb4527da862ebaf5dcb uses the new class in cpp/invalid-pointer-deref.
  • https://github.com/github/codeql/commit/2a514d60d448b3ef1ef0d7810c66cf158b3973a9 adds a isBarrierIn predicate to the product dataflow library. This is necessary since many of the heuristic allocators are just simple wrappers around allocation expressions that we already know, and we'd otherwise get duplicated paths from starting flow both inside those functions, and from the return value of the function we now recognize as a heuristic allocation expression.

Finally, I had to add nomagic to a predicate in the product dataflow library because I was observing a bad join due to bad magic.

With this PR, we now recognize https://nvd.nist.gov/vuln/detail/CVE-2016-10160 out of the box 🎉.

MathiasVP avatar Sep 29 '22 19:09 MathiasVP

I like the idea but I'm not following the parameterized module tricks you're doing.

rdmarsh2 avatar Sep 29 '22 20:09 rdmarsh2

I like the idea but I'm not following the parameterized module tricks you're doing.

Thanks! I'll add some comments to explain what's going on. The code looks slightly uglier than I wanted to because we cannot yet have type signature member predicates (although it's coming very soon!). So instead of writing the code like:

private signature class CallAllocationExprTarget extends Function {
  int getReallocPtrArgSig();

  int getSizeArgSig();

  int getSizeMultSig();

  predicate requiresDeallocSig();
}

private module CallAllocationExprBase<CallAllocationExprTarget Target> {
  class CallAllocationExprImpl instanceof FunctionCall {
    ...
  }
}

we have to write it as:

private signature class CallAllocationExprTarget extends Function;

private module CallAllocationExprBase<CallAllocationExprTarget Target> {
  signature int getReallocPtrArgSig(Target target);

  signature int getSizeArgSig(Target target);

  signature int getSizeMultSig(Target target);

  signature predicate requiresDeallocSig(Target target);

  module With<
  getReallocPtrArgSig/1 getReallocPtrArg, getSizeArgSig/1 getSizeArg, getSizeMultSig/1 getSizeMult,
  requiresDeallocSig/1 requiresDealloc> {
    class CallAllocationExprImpl instanceof FunctionCall {
      ...
    }
  }
}

MathiasVP avatar Sep 30 '22 08:09 MathiasVP

cc @ginsbach on parameterized module usage (you can restrict your 👀 to the first commit only)

MathiasVP avatar Sep 30 '22 09:09 MathiasVP

cc @ginsbach on parameterized module usage (you can restrict your 👀 to the first commit only)

Always happy to see parameterised module PRs! Wouldn't it be easier to use a module signature, rather than 4 predicate signature arguments for With<>?

ginsbach avatar Sep 30 '22 09:09 ginsbach

Wouldn't it be easier to use a module signature, rather than 4 predicate signature arguments for With<>?

I ... I need to lookup my parameterized module syntax, then. I very much think you're right!

MathiasVP avatar Sep 30 '22 09:09 MathiasVP

Wouldn't it be easier to use a module signature, rather than 4 predicate signature arguments for With<>?

Fixed in https://github.com/github/codeql/pull/10635/commits/483ff58c39ec8704862705bffab7aef63950883f. I suppose this rewrite will also make it a lot easier for @rdmarsh2 to see what's going on.

MathiasVP avatar Sep 30 '22 09:09 MathiasVP