codeql-cli-binaries icon indicating copy to clipboard operation
codeql-cli-binaries copied to clipboard

codeql database create ignore minified files

Open bananabr opened this issue 3 years ago • 5 comments

Description

When codeql database create is used on a code-base containing files whose names contain multiple dots (.) the files are ignored by the extractors.

Steps to reproduce

  1. Create a directory called test
  2. Create a file named foo.js inside the newly created directory
  3. Navigate to the directory and run the following command: codeql database create ../foo-db --language=javascript
  4. Verify the foo.js is extracted
  5. Rename the foo.js to foo.min.js
  6. Run the following command: codeql database create ../foo-min-db --language=javascript
  7. Verify the foo.js is NOT extracted

Expected behavior

When using codeql to assess production JavaScript code, minified files should be included in the database.

bananabr avatar Aug 12 '21 19:08 bananabr

Greetings, thank you for reaching out to us with this issue. Having codeql database create ignore minified files is a deliberate design decision that we made. Unfortunately, if we did create a database with the minified files, the results we could obtain on them would be very difficult to interpret since there wouldn't be meaningful line numbers and variables names. Thus, we'd end up outputting a lot of Code Scanning alerts that would be very difficult to fix and really not provide very much value to users, which is something we want to avoid.

Ideally, you would run our tool on the non-minified version of your JavaScript code where the alerts that get produced will be much easier to understand and fix. If for some reason you do not have access to the non-minified code, I would suggest you run your JavaScript files through a pretty-printer to get back some meaningful line numbers and then use CodeQL on the pretty-printed files (making sure they are named .js rather than .min.js so that they get picked up).

edoardopirovano avatar Aug 16 '21 10:08 edoardopirovano

That is basically what I've been doing.

Here is a one-liner to rename the files in case anyone needs the same thing.

find `pwd` -iname "*.js" -exec sh -c 'x="{}"; dname=`dirname $x`; bname=`basename $x`; mv $x "${dname}/`echo -n ${bname}|sed s,\\\.,_,g|sed s,_/,\\\./,g`.js"' \;

Thanks,

bananabr avatar Aug 16 '21 13:08 bananabr

Thanks for sharing that one-liner! In terms of supporting this use case more directly, while we definitely don't want to always index .min.js files, perhaps the JavaScript extractor could be configured to do this via some suitable configuration option (maybe an environment variable). I'll leave it to the @github/codeql-javascript team to decide whether this is something they would like to implement or if this issue should be closed as something we won't do.

edoardopirovano avatar Aug 16 '21 14:08 edoardopirovano

We are working on ways to pass configuration options to extractors in a more organised fashion. As a current workaround, can you try setting the environment variable LGTM_INDEX_FILTERS="include:*.min.js" in the shell before running codeql database create? This will tell the CodeQL JavaScript autobuilder not to exclude those files by default. (source)

(I second @edoardopirovano's concerns about the quality of alerts you may find in minified code, but this should get the behaviour you're asking for.)

adityasharad avatar Aug 16 '21 15:08 adityasharad

I have successfully found real-world vulnerabilities in public VDPs by inspecting minified files using codeQL running them though a prettyfier first and renaming .min files. If codeQL could somehow perform those steps that would be great for black/gray-box assessments.

bananabr avatar Aug 16 '21 17:08 bananabr