codeql-cli-binaries
codeql-cli-binaries copied to clipboard
Single predicate causing bottleneck in queries
codeql query run --threads=0 still results in only one core being utilized, which severely affects query speed. I typically see this in my log file:
Creating executor with (many) threads. ... (0s) Starting to evaluate predicate ...
With 100% usage on 1 core.
Is Visual Studio more performant for this?
I tried it in Visual Studio Code and things got worse - it still uses only one thread, and it does not seem to benefit from a disk cache unlike the CLI.
This is unfortunately often the case, especially when running a single query. The granularity of parallelism in the QL evaluator is a single predicate (or a single group of mutually recursive predicate). So if there is only one predicate ready to run (and everything else in the query depends directly or indirectly on the results of that), then only one thread will be working on it.
We do have an ambition of finding some parallelization opportunities within predicate evaluations, but making that work within the overall structure of the QL evaluator is a hard problem, so there's no timeline for when this will bear fruit.
Hi Makholm,
Thank you for your reply. To the best of my understanding, a query eventually decomposes into predicates of the form table(...). Just parallelizing those might lead to significant improvements and appears to be the easiest part to parallelize for (as compared to the complex logic needed for recursive predicates, fixed points and so on).
Hi @sad-dev. The QL evaluator will already evaluate the predicates that make up a query in parallel where possible, when the number of threads is configured as you have done. However, as @hmakholm describes above, it is only possible to evaluate two (or more) predicates in parallel when there are no dependencies between them.
In some cases when evaluating a query, there is a single predicate that is required by all the remaining predicates, so the evaluator has to finish evaluating that single predicate first, before it can parallelise any remaining work. This usually explains the bottleneck you observe. I expect that when evaluation of that first predicate completes, more of the remaining work will be done in parallel.
Could you share which query you are running, and the name of the single predicate you observe? This can be seen in the Starting to evaluate predicate log message from the CLI or the Running query progress message in VS Code. With that information, we can explain in a little more detail. In particular, that will help us suggest whether that predicate is a bottleneck dependency for the rest of the query, or whether there is actually room for us to do more in parallel.