secp256k1 Add simple static checker based on clang-query

This add a simple static checker based on clang-query, which is a tool that could be described as a "clever grep" for abstract syntax trees (ASTs).

As an initial proof of usefulness, this commit adds these checks:

No uses of floating point types.
No use of certain reserved identifiers (e.g., "mem(...)", #829).
No use of memcmp (#823).

The checks are easily extensible.

The main purpose is to run the checker on CI, and this commit adds the checker to the Travis CI script.

This currently requires clang-query version at least 10. (However, it's not required not compile with clang version 10, or with clang at all. Just the compiler flags must be compatible with clang.) Clang-query simply uses the clang compiler as a backend for generating the AST.

In order to determine the compile in which the code is supposed to be compiled (e.g., compiler flags such as -D defines and include paths), it reads a compilation database in JSON format. There are multiple ways to generate this database. The easiest way to obtain such a database is to use a tool that intercepts the make process and build the database.

On Travis CI, we currently use "bear" for this purpose. It's a natural choice because there is an Ubuntu package for it. If you want to run this locally, bear is a good choice but other tools such as compiledb (Python) are available.

Oct 15 '20 16:10 real-or-random

Here's another interesting one: match binaryOperator(allOf(unless(isExpansionInSystemHeader()), hasOperatorName("/")))

Unfortunately we use quite a few divisions, so this does not lead anywhere.

We could also check easily for reserved identifiers.

More suggestions?

Oct 15 '20 19:10 real-or-random

grml, apparently clang-query uses a compile_commands.json file to infer compile options/flags, which I had lying around in my local tree from some old experiments in 2018... So I need to rework this.

Oct 15 '20 21:10 real-or-random

Unfortunately we use quite a few divisions, so this does not lead anywhere.

Is there any way to tell if the divisor is a compile time constant? We generally don't want divisions ending up in the machine code (though if its in some setup routine it's not that big a deal) as they're non-constant time[1] and slow (e.g. 50-100 cycles on x86, worse on a lot of arm). ... but divisions with constants get converted by the compiler to multiples via strength reduction. I believe the codebase is free or nearly free of divisions with non-constant divisors (maybe there is one in scratch space sizing or something like that).

I've failed to yet fully impress my practice of always using masking over division onto Pieter yet... but there are some cases where masking doesn't easily work.

[1] also, stock valgrind ctime check will not catch divisions involving secret data. I have a pretty trivial patch to valgrind on my laptop so that it does... and it passed for me as of the last time I ran it. I'm not really eager to figure out how to make travis use a custom compiled copy of valgrind...

Oct 15 '20 22:10 gmaxwell

Unfortunately we use quite a few divisions, so this does not lead anywhere.

Is there any way to tell if the divisor is a compile time constant? We generally don't want divisions ending up in the machine code (though if its in some setup routine it's not that big a deal) as they're non-constant time[1] and slow (e.g. 50-100 cycles on x86, worse on a lot of arm). ... but divisions with constants get converted by the compiler to multiples via strength reduction. I believe the codebase is free or nearly free of divisions with non-constant divisors (maybe there is one in scratch space sizing or something like that).

Right, for example.

Match #103:

./src/ecmult_impl.h:1030:27: note: "root" binds here
    *n_batch_points = 1 + (n - 1) / *n_batches;
                          ^~~~~~~~~~~~~~~~~~~~

This is the reference for the matcher: https://clang.llvm.org/docs/LibASTMatchersReference.html

You can match on "constantExpr" but this is not what we want. This matches constant expression as seen from the point of view of the AST, i.e., this matches places where only constant expressions are allowed (such as after case).

This technique in general does not enable us to get insights on what the compiler will do with the code. It's really just pattern matching on the AST, so it's merely a clever grep. (One can do more advanced things with clang tooling but this requires more effort than writing matching expressions.)

What we could do is to check if the divisor is an integer literal, but even this is not precise we have also a few (x-1) divisors for example.

Oct 16 '20 13:10 real-or-random

This uses https://github.com/rizsotto/Bear, which intercepts make, to generate a JSON compilation database (see https://clang.llvm.org/docs/JSONCompilationDatabase.html). Bear is nice because we can easily install it on Travis. There are other tools that may work better when used locally, e.g., https://github.com/nickdiego/compiledb .

Oct 16 '20 14:10 real-or-random

Ready for review.

Travis fails but this is intentionally. See it in action here:

https://travis-ci.org/github/bitcoin-core/secp256k1/jobs/736511002#L1254 (complains about memczero declaration, see #829) https://travis-ci.org/github/bitcoin-core/secp256k1/jobs/736511002#L1313 (complains about reserved _t type declared in benchmarks)

Edit: argh, the links to the exact lines don't work due to the output folding. I might add a commit that restructures the Travis output in general by adding folds to the main commands: https://github.com/travis-ci/docs-travis-ci-com/issues/949#issuecomment-276755003 Then all the main commands be folded unless they fail, so see only the errors. Moreover we don't need the cats at the end (which have confused people in the past) and we don't need to ugly hack that I introduced here to make the color output work with cat.)

Oct 16 '20 21:10 real-or-random

This is pretty cool. I'll have to play with it a bit.

Oct 17 '20 07:10 sipa

I added a long commit message and updated the PR description to this message.

Oct 19 '20 15:10 real-or-random

FWIW

Names beginning with a capital ‘E’ followed a digit or uppercase letter Names that begin with either ‘is’ or ‘to’ followed by a lowercase letter Names that begin with ‘LC_’ followed by an uppercase letter Names of all existing mathematics functions suffixed with ‘f’ or ‘l’ Names that begin with ‘SIG’ followed by an uppercase letter Names that begin with ‘SIG_’ followed by an uppercase letter Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter Names that end with ‘_t’

Oct 22 '20 10:10 elichai

@elichai Yeah, where did you get this list from? Is this really all about all names? I thought E is only for macros. We can't check for macros using clang-query but we could add a simple grep for #define[[:space:]+]E etc. I also have uncommitted changes to add https://clang.llvm.org/docs/DiagnosticsReference.html#wreserved-id-macro if the compiler supports it (this covers only _ and __ macros). Not sure if overkill but it's not a lot of work.

Oct 22 '20 10:10 real-or-random

@elichai Yeah, where did you get this list from? Is this really all about all names? I thought E is only for macros.

https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

About E it's weird, in the gcc docs it says for additional error code names which are macros, and the exact wording of the standard is:(7.1.4)

Macros that begin with E and a digit or E and an uppercase letter (followed by any combination of digits, letters, and underscore) may be added to the declarations in the <ermo. h> header.

But on 7.1.3 "Reserved identifies" it says:

Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.

This paper proposing to change that also agrees on that interpretation: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2572.pdf (See "Future Library Directions")

However, p1 makes it clear that all identifiers reserved from this subclause are reserved identifiers regardless of what header files are included, meaning that these rules apply to all C code

In effect, these identifiers are reserved for all uses in C regardless of what header files (if any) are included

EDIT: I realize now that you asked about identifier vs macro, and not about if it's related to errno.h or not. it sounds like it's only for macros but the lawyering here is delicate

Oct 22 '20 11:10 elichai

Standard on 6.8.3 Semantics:

The identifier immediately following the define is called the macro name

and also Aaron Ballman(the author of that paper, and part of WG14 committee) told me:

macros are defined using an identifier, so "macro name" and "identifier" are interchangable "reserved for use as a macro name" is just saying how it's intended to be used, not that macro names are a special thing

So yeah :/

Oct 22 '20 11:10 elichai

It makes sense: if some macro is going to get called E1234 then you obviously can't have a variable named E1234 without trouble.

Oct 22 '20 13:10 gmaxwell

secp256k1 secp256k1 copied to clipboard

Add simple static checker based on clang-query

secp256k1
secp256k1 copied to clipboard