codeql icon indicating copy to clipboard operation
codeql copied to clipboard

.json files not included in database

Open GunnarHakansson opened this issue 3 years ago • 7 comments
trafficstars

When creating a database with javascript as the language, I expect .json files to be included. But, they are not.

Steps to reproduce:

  1. Create a folder t
  2. Create the following files in it
    1. dummy.js (This is only here to disable the warning that there is no code)
      (function(){window.console.log("")})();
      
    2. file.json
        {
            "M": "6"
        }
      
  3. Create the database codeql database create -s t -l javascript output

The created database contains dummy.js, but not file.json.

I tested this with both CLI 2.10.0 and 2.10.1.

Question: Is this a bug? Or am I wrong when I assumed this is supposed to work? Are there any simple workarounds?

If you are wondering why I want this to work, the reason is that I want to write my own CodeQL queries to check some json configuration files.

GunnarHakansson avatar Aug 02 '22 09:08 GunnarHakansson

We're not including all JSON files from the source tree in the database for efficiency reasons -- most JSON files in most project are not relevant for CodeQL, and some projects have large amounts of JSON data that would bog down a standard analysis.

To extend the list of which files to extract, create a file myconfig.yml containing e.g.

paths:
 - .              # a root directory must always be explicitly named :-/
 - data/foo.json  # include the named file
 - "**/bar*.json" # npm-like path patterns can also be used

and then give a --codescanning-config=myconfig.yml argument to codeql database create.

(This doesn't work for all languages, but it does work for Javascript).

hmakholm avatar Aug 02 '22 12:08 hmakholm

I've now tried to create a config file, and I can't get it to work. CodeQL scans the entire drive for codeql-workspace.yml and .codeqlmanifest.json. Then it fails with:

A fatal error occurred: codeql/javascript-queries is not a .ql file, .qls file, a directory, or a query pack specification.

I have tried multiple combinations of:

  • Setting the CLI argument --search-path to the CodeQL CLI folder
  • Setting disable-default-queries: true in the config file
  • Adding the CodeQL CLI folder to paths in the config file But nothing really works.

If I set disable-default-queries: true and specify packs, I get the following error:

ERROR: Referenced pack 'codeql/javascript-all' not found. (D:\codeql-tests\queries\qlpack.yml:1,1-1)
A fatal error occurred: Could not resolve library path for D:\codeql-tests\queries

That is as far as I get. Could you provide some more help with how to make a valid config file?

GunnarHakansson avatar Aug 02 '22 16:08 GunnarHakansson

Huh, that sounds strange, and shouldn't at all be possible consequences of the procedure I described.

Did you confuse the config file I spoke about with the config file you can put in ~/.config/codeql/config? They are completely different.

I don't think it's even possible for codeql database create to emit a complaint about codeql/javascript-queries is not a .ql file ... What exactly are you doing? Can you provide a full transcript of the terminal session that exhibits the behavior you're showing starting form you create the database until you see the error message?

hmakholm avatar Aug 03 '22 13:08 hmakholm

I may have confused the config files, but you should be able to see it below. I haven't changed any central configuration files, so if I don't specify --codescanning-config I can still get a database that doesn't contain any .json files.

Terminal:

PS D:\codeql-tests> D:\codeql\codeql.exe database create --language javascript --codescanning-config=./config.yml -v t6
Writing logs to D:\codeql-tests\t6\log\database-create-20220803.161636.969.log.
Initializing database at D:\codeql-tests\t6.
Counting lines of code in D:\codeql-tests
Resolving extractor javascript.
Successfully loaded extractor JavaScript (javascript) from D:\codeql\javascript.
Created skeleton CodeQL database at D:\codeql-tests\t6. This in-progress database is ready to be populated by an extractor.
A fatal error occurred: codeql/javascript-queries is not a .ql file, .qls file, a directory, or a query pack specification.
PS D:\codeql-tests>

Contents of D:\codeql-tests\config.yml:

name: "CodeQL config2"
paths:
  - 'D:/codeql-tests/t'

D:\codeql is where I have put the CLI folder.

D:\codeql-tests\t6\log\database-create-20220803.161636.969.log

GunnarHakansson avatar Aug 03 '22 14:08 GunnarHakansson

Okay, I stand corrected.

I wasn't aware codeql database create would attempt to resolve default queries for a language when one of those config files is used -- in my mental model that would only happen when you try to run queries. The log file attached showed me the right place in the code to look.

Looks like the config file feature is not as well-tested for use outside a CI environment where the CodeQL library packs are already downloaded and available as I thought it was. (My local test setup resembles that environment enough that I wasn't hit by the problem when I tried myself).

If you have a local checkout of https://github.com/github/codeql, that is where you should point your --search-path -- not to the CLI folder.

But if even that doesn't work, let's forget the config file feature for now and use a lower-level hack to get you unblocked instead:

  • When running codeql database create, first set the environment variable LGTM_INDEX_FILTERS to include:**/*.json, or a similar path matcher.

hmakholm avatar Aug 03 '22 15:08 hmakholm

Setting LGTM_INDEX_FILTERS to include:**/*.json works as expected, and I have now managed to write and test one query on my local machine. I haven't tested CI yet, but I will get to it soon.

Regarding the solution, you might want to update the documentation around .json files. I would expect to see something around .json being opt-in on this page: https://codeql.github.com/docs/codeql-overview/supported-languages-and-frameworks/ But, maybe you want to make some other changes first since LGTM_INDEX_FILTERS wasn't your first choice?

I'll leave it to you to decide if this issue should be closed or kept open. Either way, I'm happy.

Thank you for the help!

GunnarHakansson avatar Aug 04 '22 11:08 GunnarHakansson

That's a good point -- the documentation table does indeed give the impression it should have worked right out of the box. I'll poke a bit around internally to see if that can be improved.

hmakholm avatar Aug 04 '22 14:08 hmakholm