vscode-codeql
vscode-codeql copied to clipboard
Handle large (>4GB) SARIF results files on reopen
Describe the bug
Support for large sarif files was added in https://github.com/github/vscode-codeql/pull/1004 which "fixed" https://github.com/github/vscode-codeql/issues/735. However, this only works one time. When the sarif file already exists and the results are reloaded upon clicking the history item, we once again get Showing raw results instead of interpreted ones due to an error. Cannot create a string longer than 0x1fffffe8 characters
error.
It appears to me that the bug is in https://github.com/github/vscode-codeql/blob/main/extensions/ql-vscode/src/query-results.ts#L163-L166, which correctly uses a streaming sarif parser if the file doesn't exists, but uses JSON.parse
if it does (and the latter cannot handle large files). So with that pinpointed, I hope that the fix is easy.
Version CodeQL extension version: 1.6.9 CodeQL CLI version: 2.10.2 Platform: darwin x64
To reproduce
Run a query producing a large sarif file. Click the result item.
I think SensitiveInfoLog.ql
on elasticsearch/elasticsearch ought to do the trick.
Expected behavior No crashes, and the path results shown.
Additional context
This is a problem that is in a sense already fixed - we just need to make sure that all code paths actually use the fixed code rather than the broken JSON.parse
.
See also https://github.slack.com/archives/C02SK2TJKPY/p1659692701907299
Started at https://github.com/github/vscode-codeql/pull/1457
I'm attempting to reproduce the original error and confirm the fix, and actually am stuck on the Interpreting query results using CodeQL CLI
step for the longest time:
How long does this step usually take on your end?
data:image/s3,"s3://crabby-images/d67f9/d67f948d4e02006edd69874be820d326192a9e88" alt="Screen Shot 2022-08-09 at 7 24 42 PM"
After I left for dinner and came back, it completed and I was able to reproduce 😄
How long does this step usually take on your end?
Several minutes for the large cases.
How long does this step usually take on your end?
Several minutes for the large cases.
Hm.. when I ran the SensitiveInfoLog.ql
query on elasticsearch it took well over 10. When I try to replicate the issue by re-opening of the results view afterwards it also seems to hang.
I have some local changes that will add some progress logging, but that's blocked on https://github.com/github/vscode-codeql/issues/1459.
You should make sure that you're using multiple threads to calculate the paths - otherwise it really takes a long time. A good way to check whether you're actually using multiple threads for this is to run jstack
on the process id and see. You should see about 8 threads doing Dijkstra computations (they'll include something with com.semmle.util.graph.ComputePaths
in their stack trace).
(Or I guess you could check your CPU usage to gauge the number of computing threads)
When I try to replicate the issue by re-opening of the results view afterwards it also seems to hang.
Reopening a multi-GB sarif file does take a little bit of time, I guess.
Thank you for the patience @aschackmull while this PR was open and in progress. The fix is finally in in https://github.com/github/vscode-codeql/pull/1457