codeql-action icon indicating copy to clipboard operation
codeql-action copied to clipboard

Compress release bundle with zstandard/zstd to reduce size

Open DSmithVA opened this issue 1 year ago • 1 comments

I propose a .zstd download option alongside the existing .gz one for Linux releases. For the latest 2.18.1 linux64 bundle, using zstd instead of gzip can cut off 33% of the file size, or 822.8 MiB down to 553.2 MiB.

Example command to convert the existing .gz: zcat codeql-bundle-linux64.tar.gz | zstd --long=27 -9 -o codeql-bundle-linux64.tar.zstd

File sizes: 862823301 codeql-bundle-linux64.tar.gz 580124258 codeql-bundle-linux64.tar.zstd

For zstd arguments, compression levels above -9 saw diminishing returns, though -19 does get down to 504.5 MiB while taking 12x longer to compress. Using higher --long= values improves compression, but 27 is the highest value that clients can process by default, per https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md?plain=1#L162

Compression with xz is also an improvement, it's just noticeably slower. Either is an improvement over just .gz and any recent linux will support both .zstd or .xz for decompression.

DSmithVA avatar Jul 31 '24 13:07 DSmithVA

Thanks for your feedback. We'll take this into consideration.

jketema avatar Jul 31 '24 13:07 jketema

This was implemented, but now failing on Github Enterprise because the base docker images running in the ARC doesn't include zstd in any of the tags, and the current v3 tag is pointing to a version that requires zstd.

So, codeQL Can't start be initialized in the default runner... I don't see any release of the runner with the zstd in https://github.com/actions/runner/blob/main/images/Dockerfile ...

Image

marcellodesales avatar Jan 21 '25 02:01 marcellodesales

That is unfortunate. Also, the .zst archives being created are much larger than necessary since the --long=27 flag was not used. For the most recent linux bundle I get a 25% smaller file:

curl -LO https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.1/codeql-bundle-linux64.tar.zst
stat -c %s codeql-bundle-linux64.tar.zst
608400767
cat codeql-bundle-linux64.tar.zst | zstd -d | zstd --long=27 | wc -c
455054249

DSmithVA avatar Jan 21 '25 02:01 DSmithVA

@marcellodesales : According to our engineers, the job should only download the .zst bundle when zstd exists on the path, falling back to tar if it doesn't. Can I please ask you to rerun the job in debug mode, and upload the log files if possible, for us to better debug the issue?

hvitved avatar Jan 21 '25 12:01 hvitved

@DSmithVA : Thanks a lot for bringing this to our attention; we are currently testing this approach, and it does indeed look promising.

hvitved avatar Jan 21 '25 12:01 hvitved

@hvitved I have posted the info below at https://github.com/github/codeql-action/issues/2705#issuecomment-2605344817 as well... All languages fail with the latest version...

🔧 Settings

      - name: Initialize CodeQL
        uses: github/codeql-action/[email protected]
        with:
          debug: true
          languages: go
          build-mode: "manual"
          config-file: .github/codeql-config.yml

⌨ Logs

##[debug]Evaluating condition for step: 'Initialize CodeQL'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Initialize CodeQL
##[debug]Register post job cleanup for action: github/codeql-action/init@v3.[2](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:2)8.1
##[debug]Loading inputs
##[debug]Evaluating: secrets.ACCESS_TOKEN
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'ACCESS_TOKEN'
##[debug]=> null
##[debug]Result: null
##[debug]Evaluating: github.token
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'token'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: toJson(matrix)
##[debug]Evaluating toJson:
##[debug]..Evaluating matrix:
##[debug]..=> null
##[debug]=> 'null'
##[debug]Result: 'null'
##[debug]Loading env
Run github/codeql-action/init@v[3](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:3).28.1
  
Job run UUID is 0cde5708-9[4](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:4)e4-46a6-80e2-deb7dfb9[5](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:5)ff0.
##[debug]Running git command: git rev-parse HEAD
##[debug]Sending status report: {"action_name":"init","action_oid":"unknown","action_ref":"v3.28.1","action_started_at":"2025-01-21T17:23:55.998Z","action_version":"3.28.1","analysis_key":".github/workflows/codeql-golang.yml:analyze","commit_oid":"d8d1429bce[6](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:6)6e76202d14b5cc22251b91dfaa91f","first_party_analysis":true,"job_name":"analyze","job_run_uuid":"0cde5[7](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:7)08-94e4-46a6-80e2-deb7dfb95ff0","ref":"refs/pull/104/merge","runner_os":"Linux","started_at":"2025-01-21T17:23:55.99[8](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:8)Z","status":"starting","steady_state_default_setup":false,"testing_environment":"","workflow_name":"codeQL-golang","workflow_run_attempt":2,"workflow_run_id":2678337,"actions_event_name":"pull_request","runner_available_disk_space_bytes":741[9](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:9)637760,"runner_total_disk_space_bytes":8589934592,"matrix_vars":"null","runner_arch":"X64"}
::group::Setup CodeQL tools
Setup CodeQL tools
  ##[debug]Found tar.
  ##[debug]Could not find zstd: Error: Unable to locate executable file: zstd. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable.
  /usr/bin/tar --version
  tar (GNU tar) 1.34
  Copyright (C) 2021 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  
  Written by John Gilmore and Jay Fenlason.
  Found gnu tar version 1.34.
  ##[debug]Attempting to obtain CodeQL tools. CLI version: 2.20.1, bundle tag name: codeql-bundle-v2.20.1, URL: unspecified.
  ##[debug]isExplicit: 2.20.1
  ##[debug]explicit? true
  ##[debug]checking cache: /home/runner/_work/_tool/CodeQL/2.20.1/x64
  ##[debug]not found
  ##[debug]Didn't find a version of the CodeQL tools in the toolcache with a version number exactly matching 2.20.1.
  ##[debug]Found the following versions of the CodeQL tools in the toolcache: [].
  ##[debug]Didn't find any versions of the CodeQL tools starting with 2.20.1 in the toolcache. Trying next fallback method.
  ##[debug]Computed a fallback toolcache version number of 2.20.1 for CodeQL version 2.20.1.
  ##[debug]isExplicit: 2.20.1
  ##[debug]explicit? true
  ##[debug]checking cache: /home/runner/_work/_tool/CodeQL/2.20.1/x64
  ##[debug]not found
  Did not find CodeQL tools version 2.20.1 in the toolcache.
  ##[debug]Did not find any candidate pinned versions of the CodeQL tools in the toolcache.
  Found CodeQL bundle in github/codeql-action on https://git.company.com with URL https://git.company.com/api/v3/repos/github/codeql-action/releases/assets/5565.
  Using CodeQL CLI version 2.20.1 sourced from https://git.company.com/api/v3/repos/github/codeql-action/releases/assets/5565 .
  ##[debug]Providing an authorization token to download CodeQL tools.
  ##[debug]Not running against github.com. Disabling all toggleable features.
  ##[debug]Writing feature flags to /home/runner/_work/_temp/cached-feature-flags.json
  ##[debug]Feature 'extract_to_toolcache' undefined in API response.
  ##[debug]Feature extract_to_toolcache is disabled due to its default value.
  Downloading CodeQL tools from https://git.company.com/api/v3/repos/github/codeql-action/releases/assets/5565 . This may take a while.
  Streaming the extraction of the CodeQL bundle.
  ##[debug]Extracting to /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244. Input stream has high water mark 4194304.
  tar -x --zstd --warning=no-unknown-keyword --overwrite -f - -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244
  tar (grandchild): zstd: Cannot exec: No such file or directory
  tar (grandchild): Error is not recoverable: exiting now
  tar: Child died with signal 13
  tar: Error is not recoverable: exiting now
  ##[debug]Cleaning up extraction destination directory.
  ##[debug]Cleaned up extraction destination directory.
  Warning: Failed to download and extract CodeQL bundle using streaming with error: Error while downloading and extracting tar: Error: write EPIPE
  Warning: Falling back to downloading the bundle before extracting.
  ##[debug]Cleaning up CodeQL bundle.
  Warning: Failed to clean up CodeQL bundle: no files found matching /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244.
  ##[debug]Downloading https://git.company.com/api/v3/repos/github/codeql-action/releases/assets/5565
  ##[debug]Destination /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a
  ##[debug]set auth
  ##[debug]download complete
  Finished downloading CodeQL bundle to /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a (11.1s).
  Extracting CodeQL bundle.
  ##[debug]Extracting to /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244.
  tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244
  tar (child): zstd: Cannot exec: No such file or directory
  tar (child): Error is not recoverable: exiting now
  tar: Child returned status 2
  tar: Error is not recoverable: exiting now
  ##[debug]Cleaning up extraction destination directory.
  ##[debug]Cleaned up extraction destination directory.
  ##[debug]Cleaning up CodeQL bundle archive.
  ##[debug]Cleaned up CodeQL bundle archive.
  Error: Unable to download and extract CodeQL CLI: Failed to run "tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244". Exit code was 2 and last log line was: n/a. See the logs for more details.
  
  Details: Error: Failed to run "tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244". Exit code was 2 and last log line was: n/a. See the logs for more details.
      at ChildProcess.<anonymous> (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/tar.js:171:28)
      at ChildProcess.emit (node:events:519:28)
      at ChildProcess._handle.onexit (node:internal/child_process:294:12)
  ##[debug]Running git command: git rev-parse HEAD
  ##[debug]Sending status report: {"action_name":"init","action_oid":"unknown","action_ref":"v3.28.1","action_started_at":"2025-01-21T17:23:55.998Z","action_version":"3.28.1","analysis_key":".github/workflows/codeql-golang.yml:analyze","commit_oid":"d8d1429bce66e76202d14b5cc22251b91dfaa91f","first_party_analysis":true,"job_name":"analyze","job_run_uuid":"0cde5708-94e4-46a6-80e2-deb7dfb95ff0","ref":"refs/pull/[10](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:10)4/merge","runner_os":"Linux","started_at":"2025-01-21T17:23:55.998Z","status":"aborted","steady_state_default_setup":false,"testing_environment":"","workflow_name":"codeQL-golang","workflow_run_attempt":2,"workflow_run_id":2678337,"actions_event_name":"pull_request","runner_available_disk_space_bytes":7419633664,"runner_total_disk_space_bytes":8589934592,"cause":"Unable to download and extract CodeQL CLI: Failed to run \"tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244\". Exit code was 2 and last log line was: n/a. See the logs for more details.\n\nDetails: Error: Failed to run \"tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244\". Exit code was 2 and last log line was: n/a. See the logs for more details.\n    at ChildProcess.<anonymous> (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/tar.js:171:28)\n    at ChildProcess.emit (node:events:519:28)\n    at ChildProcess._handle.onexit (node:internal/child_process:294:[12](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:12))","exception":"Error: Unable to download and extract CodeQL CLI: Failed to run \"tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-87[13](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:13)-81909027bb0a -C /home/runner/_work/_temp/c2[14](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:14)6770-b178-4be5-9164-0a0e8345e244\". Exit code was 2 and last log line was: n/a. See the logs for more details.\n\nDetails: Error: Failed to run \"tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244\". Exit code was 2 and last log line was: n/a. See the logs for more details.\n    at ChildProcess.<anonymous> (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/tar.js:171:28)\n    at ChildProcess.emit (node:events:519:28)\n    at ChildProcess._handle.onexit (node:internal/child_process:294:12)\n    at setupCodeQL (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/codeql.js:[15](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:15)0:15)\n    at async initCodeQL (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/init.js:55:97)\n    at async run (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/init-action.js:[17](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:17)5:34)\n    at async runWrapper (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/init-action.js:436:9)","completed_at":"[20](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:20)25-01-21T17:24:08.201Z","matrix_vars":"null","runner_arch":"X64"}
  ##[debug]Node Action run completed with exit code 1
  ##[debug]CODEQL_ACTION_FEATURE_MULTI_LANGUAGE='false'
  ##[debug]CODEQL_ACTION_FEATURE_SANDWICH='false'
  ##[debug]CODEQL_ACTION_FEATURE_SARIF_COMBINE='true'
  ##[debug]CODEQL_ACTION_FEATURE_WILL_UPLOAD='true'
  ##[debug]CODEQL_ACTION_VERSION='3.28.1'
  ##[debug]CODEQL_ACTION_WARNED_ABOUT_VERSION='true'
  ##[debug]JOB_RUN_UUID='0cde5708-94e4-46a6-80e2-deb7dfb95ff0'
  ##[debug]CODEQL_ACTION_INIT_HAS_RUN='true'
  ##[debug]CODEQL_ACTION_ANALYSIS_KEY='.github/workflows/codeql-golang.yml:analyze'
  ##[debug]CODEQL_WORKFLOW_STARTED_AT='2025-01-[21](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:21)T17:23:55.998Z'
  ##[debug]CODEQL_ACTION_JOB_STATUS='JOB_STATUS_FAILURE'
  ##[debug]Save intra-action state persisted_inputs = [["INPUT_DEBUG","true"],["INPUT_LANGUAGES","go"],["INPUT_BUILD-MODE","manual"],["INPUT_CONFIG-FILE",".github/codeql-config.yml"],["INPUT_QUERIES","security-extended,security-and-quality"],["INPUT_EXTERNAL-REPOSITORY-TOKEN",""],["INPUT_TOOLS",""],["INPUT_TOKEN","***"],["INPUT_REGISTRIES",""],["INPUT_MATRIX","null"],["INPUT_DB-LOCATION",""],["INPUT_CONFIG",""],["INPUT_PACKS",""],["INPUT_SETUP-PYTHON-DEPENDENCIES",""],["INPUT_SOURCE-ROOT",""],["INPUT_RAM",""],["INPUT_THREADS",""],["INPUT_DEBUG-ARTIFACT-NAME",""],["INPUT_DEBUG-DATABASE-NAME",""],["INPUT_TRAP-CACHING",""],["INPUT_DEPENDENCY-CACHING",""]]
  ##[debug]Finishing: Initialize CodeQL

marcellodesales avatar Jan 21 '25 17:01 marcellodesales

@henrymercer : Is the log above sufficient for you to debug? I notice the line

##[debug]Could not find zstd: Error: Unable to locate executable file: zstd. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable.

which suggests that we should be detecting that zstd is not present?

hvitved avatar Jan 22 '25 09:01 hvitved

Thanks for the debug logs @marcellodesales. https://github.com/github/codeql-action/pull/2710 should fix this issue. I'll let you know once this is available in a stable release — this should be ready by the end of the week.

henrymercer avatar Jan 22 '25 16:01 henrymercer

@marcellodesales The fix is now released as part of v3.28.3. I've asked in the other thread whether you be able to verify the fix by updating to the latest version of the CodeQL Action.

@DSmithVA Thanks again for bringing this to our attention, CodeQL Bundle v2.20.4 will ship with a reduced bundle size.

I'll close this issue.

henrymercer avatar Jan 23 '25 12:01 henrymercer

@henrymercer Thank you for providing it... I will verify this week and report back! I did create a PR in the runner project to get the base image with zstd for faster execution https://github.com/actions/runner/pull/3670

marcellodesales avatar Jan 23 '25 17:01 marcellodesales

@henrymercer We still don't have the base runner with zstd as discussed before... https://github.com/actions/runner/pull/3670 That way, we will be getting very slow bootstrap of codeql....

$ docker run -ti remote-ghcr.docker.artifactory.viasat.com/actions/actions-runner:2.328.0 zstd
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "zstd": executable file not found in $PATH: unknown

Run 'docker run --help' for more information

marcellodesales avatar Oct 03 '25 23:10 marcellodesales