vscode-codeql icon indicating copy to clipboard operation
vscode-codeql copied to clipboard

Handle large (>4GB) SARIF results files on reopen

Open aschackmull opened this issue 2 years ago • 8 comments

Describe the bug Support for large sarif files was added in https://github.com/github/vscode-codeql/pull/1004 which "fixed" https://github.com/github/vscode-codeql/issues/735. However, this only works one time. When the sarif file already exists and the results are reloaded upon clicking the history item, we once again get Showing raw results instead of interpreted ones due to an error. Cannot create a string longer than 0x1fffffe8 characters error.

It appears to me that the bug is in https://github.com/github/vscode-codeql/blob/main/extensions/ql-vscode/src/query-results.ts#L163-L166, which correctly uses a streaming sarif parser if the file doesn't exists, but uses JSON.parse if it does (and the latter cannot handle large files). So with that pinpointed, I hope that the fix is easy.

Version CodeQL extension version: 1.6.9 CodeQL CLI version: 2.10.2 Platform: darwin x64

To reproduce Run a query producing a large sarif file. Click the result item. I think SensitiveInfoLog.ql on elasticsearch/elasticsearch ought to do the trick.

Expected behavior No crashes, and the path results shown.

Additional context This is a problem that is in a sense already fixed - we just need to make sure that all code paths actually use the fixed code rather than the broken JSON.parse.

See also https://github.slack.com/archives/C02SK2TJKPY/p1659692701907299

aschackmull avatar Aug 05 '22 11:08 aschackmull

Started at https://github.com/github/vscode-codeql/pull/1457

angelapwen avatar Aug 09 '22 16:08 angelapwen

I'm attempting to reproduce the original error and confirm the fix, and actually am stuck on the Interpreting query results using CodeQL CLI step for the longest time: Screen Shot 2022-08-09 at 6 57 14 PM

How long does this step usually take on your end?

angelapwen avatar Aug 09 '22 16:08 angelapwen

Screen Shot 2022-08-09 at 7 24 42 PM

After I left for dinner and came back, it completed and I was able to reproduce 😄

angelapwen avatar Aug 09 '22 17:08 angelapwen

How long does this step usually take on your end?

Several minutes for the large cases.

aschackmull avatar Aug 10 '22 08:08 aschackmull

How long does this step usually take on your end?

Several minutes for the large cases.

Hm.. when I ran the SensitiveInfoLog.ql query on elasticsearch it took well over 10. When I try to replicate the issue by re-opening of the results view afterwards it also seems to hang.

angelapwen avatar Aug 10 '22 13:08 angelapwen

I have some local changes that will add some progress logging, but that's blocked on https://github.com/github/vscode-codeql/issues/1459. You should make sure that you're using multiple threads to calculate the paths - otherwise it really takes a long time. A good way to check whether you're actually using multiple threads for this is to run jstack on the process id and see. You should see about 8 threads doing Dijkstra computations (they'll include something with com.semmle.util.graph.ComputePaths in their stack trace).

aschackmull avatar Aug 10 '22 13:08 aschackmull

(Or I guess you could check your CPU usage to gauge the number of computing threads)

aschackmull avatar Aug 10 '22 13:08 aschackmull

When I try to replicate the issue by re-opening of the results view afterwards it also seems to hang.

Reopening a multi-GB sarif file does take a little bit of time, I guess.

aschackmull avatar Aug 10 '22 13:08 aschackmull

Thank you for the patience @aschackmull while this PR was open and in progress. The fix is finally in in https://github.com/github/vscode-codeql/pull/1457

angelapwen avatar Oct 24 '22 19:10 angelapwen