vscode-codeql
vscode-codeql copied to clipboard
Report suspicious join orders
This PR adds support for scanning the evaluator log to detect suspicious join orders that may be causing performance problems in a query. The actual detection algorithm is cut-and-pasted, and slightly refactored, from DCA. I've attempted to keep the portions of the code that scan the log clean of any dependencies on VS Code, so that we can eventually share the relevant code between the extension and DCA.
The algorithm computes a numeric metric for each relation in the RA, attempting to estimate how expensive the actual join order is compared to an ideal join order. Based on usage in DCA, we've set the threshold metric at 50. Anything higher than that is flagged as suspicious. In practice, most good join orders tend to have metrics in the single digits, and most bad join orders have metrics a couple orders of magnitude higher than that.
Relevant classes/interfaces include:
Non-VSCode types/functions
These types don't depend on VS Code, so could be reused in DCA to share much of the implementation.
-
EvaluationLogScanner
- Handles events from scanning a single JSON log summary -
EvaluationLogScannerProvider
- Factory forEvaluationLogScanner
s. -
EvaluationLogScannerSet
- Registry of scanner providers. When a log is scanned, this creates an instance of each registered scanner and passes the individual log records to each scanner. -
EvaluationLogProblemReporter
- Callback interface used by scanners to report errors/warnings. -
JoinOrderScannerProvider
- Implementation of the scanner interfaces that detects suspicious join orders. -
generateSummarySymbolsFile()
- Generates a JSON file that maps the name of each relation from a human-readable summary to the location of its RA. This currently depends on the exact format of the human-readable summary, so I'd like to move this into the CLI itself once it's working well. -
readJsonlFile()
- Reads a file in our particular flavor of "human-readable JSONL", invoking a callback for each top-level object. Note that true JSONL separates objects with single newline, requiring each object to be on a single line. Our flavor separates objects with a double newline, so that single newlines can be used within an object. This function is currently just a minor refactoring of what we already used to parse the logs for the log viewer, so it still reads the entire file into memory at once. The interface will allow us to change the implementation to stream objects, however.
VS Code types
-
LogScannerService
- Connects anEvaluationLogScannerSet
to all the right VS Code events and UI. It scans the log whenever the current query history item changes, or when the current query completes. All reported problems are routed to theProblems
view in VS Code.
The best I was able to come up with for testing was a unit test that scans a log fixture and expects the proper warning.
Note to reviewers: The actual implementation of the join order metric probably doesn't need a very thorough review, since it's just a slightly reorganized version of the existing DCA code. @MathiasVP you might want to take a look at join-order.ts
just to see what I did, though.
I believe that I've addressed or responded to all feedback. Should be ready for re-review.
The bad-join-order-detection code in join-order.ts
LGTM now! In fact, I hope we can backport some of these lovely improvements to DCA at some point (or even share some of the code in a glorious future)