cockroach
cockroach copied to clipboard
sql: new heuristic-based completion engine
This PR extends the server-side completion logic available under SHOW COMPLETIONS.
Informs #37355. Intended for use together with #86457, which would fix issue 37355.
The statement now returns 5 columns: completion, category, description, start, end. This change is backward-compatible with the CLI code in previous versions, which was not checking the number of columns.
It works roughly as follows:
-
first the input is scanned into a token array.
-
then the token array is translated to a sketch: a string with the same length as the token array, where each character corresponds to an input token and an extra character indicating where the completion was requested.
-
then the completion rules are applied in sequence. Each rule inspects the sketch (and/or the token sequence) and decides whether to do nothing (skip to next rule), or to run some SQL which will return a batch of completion candidates.
Each method operates exclusively on the token sequence, and does not require the overall SQL syntax to be valid or complete.
Also, the completion engine executes in a streaming fashion, so that there is no need to accumulate all the completion candidates in RAM before returning them to the client. This prevents excessive memory usage server-side, and offers the client an option to cancel the search once enough suggestions have been received.
Code organization:
-
new package
sql/compengine: the core completion engine, also where sketches are computed.Suggested reading:
api.goin the new package. -
new package
sql/comprules: the actual completion methods/heuristics, where sketches are mapped into SQL queries that provide the completion candidates.Suggested reading: the test cases under
testdata.
Some more words about sketches:
For example, SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from sc.mytable;' produces the sketch "ii'ii.i;" where i indicates an
identifier-like token and ' indicates the cursor position.
The purpose of the sketch is to simplify the pattern matching performed by the completion heuristics, and enables the application of regular expressions on the token sequence.
For example, a heuristic to complete schema-qualified relations in
the current database, or a database-qualified relation, would use
the regexp [^.;]i\.(['_]|i'): an identifier not following a
period or a semicolon, followed by a period, followed by the
completion cursor (with the cursor either directly after the period
or on an identifier after the period).
Supersedes #45186.
@rickystewart when you get a chance (NOT URGENT) could you help with this?
I've defined a new generator rule in pkg/sql/scanner that generates token_names_test.go.
Because it's generated, I'm not adding it to the srcs in the go_test() invocation.
However every time I run dev generate bazel, bazel/gazelle auto-adds it to the srcs list. Then if I run generate go the build starts failing because the file exists both in my work tree and in the sandbox.
Do you know how I can prevent dev generate bazel from adding the generated file to srcs?
Do you know how I can prevent dev generate bazel from adding the generated file to srcs?
Yes -- add a comment to BUILD.bazel (probably the one in your package subdirectory) like # gazelle:exclude token_names_test.go.
thank you!
@rickystewart hi, for when you have time (NOT URGENT), this PR could use some help:
There is a new generated file in sql/scanner (token_names_test.go). The linter is now complaining that it cannot build/lint the code because this generated file is missing.
How can I add the generation of this file to the dependencies for the linter?
Oh nvm, found it -- it's because I forgot to update the Makefile.
This is now rebased, with more tests added, ready for review.
TFYRs!
bors r=rytaft,ZhouXing19