cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

sql: new heuristic-based completion engine

Open knz opened this issue 3 years ago • 6 comments
trafficstars

This PR extends the server-side completion logic available under SHOW COMPLETIONS.

Informs #37355. Intended for use together with #86457, which would fix issue 37355.

The statement now returns 5 columns: completion, category, description, start, end. This change is backward-compatible with the CLI code in previous versions, which was not checking the number of columns.

It works roughly as follows:

  1. first the input is scanned into a token array.

  2. then the token array is translated to a sketch: a string with the same length as the token array, where each character corresponds to an input token and an extra character indicating where the completion was requested.

  3. then the completion rules are applied in sequence. Each rule inspects the sketch (and/or the token sequence) and decides whether to do nothing (skip to next rule), or to run some SQL which will return a batch of completion candidates.

    Each method operates exclusively on the token sequence, and does not require the overall SQL syntax to be valid or complete.

Also, the completion engine executes in a streaming fashion, so that there is no need to accumulate all the completion candidates in RAM before returning them to the client. This prevents excessive memory usage server-side, and offers the client an option to cancel the search once enough suggestions have been received.

Code organization:

  • new package sql/compengine: the core completion engine, also where sketches are computed.

    Suggested reading: api.go in the new package.

  • new package sql/comprules: the actual completion methods/heuristics, where sketches are mapped into SQL queries that provide the completion candidates.

    Suggested reading: the test cases under testdata.

Some more words about sketches:

For example, SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from sc.mytable;' produces the sketch "ii'ii.i;" where i indicates an identifier-like token and ' indicates the cursor position.

The purpose of the sketch is to simplify the pattern matching performed by the completion heuristics, and enables the application of regular expressions on the token sequence.

For example, a heuristic to complete schema-qualified relations in the current database, or a database-qualified relation, would use the regexp [^.;]i\.(['_]|i'): an identifier not following a period or a semicolon, followed by a period, followed by the completion cursor (with the cursor either directly after the period or on an identifier after the period).

Supersedes #45186.

knz avatar Sep 08 '22 14:09 knz

This change is Reviewable

cockroach-teamcity avatar Sep 08 '22 14:09 cockroach-teamcity

@rickystewart when you get a chance (NOT URGENT) could you help with this?

I've defined a new generator rule in pkg/sql/scanner that generates token_names_test.go.

Because it's generated, I'm not adding it to the srcs in the go_test() invocation.

However every time I run dev generate bazel, bazel/gazelle auto-adds it to the srcs list. Then if I run generate go the build starts failing because the file exists both in my work tree and in the sandbox.

Do you know how I can prevent dev generate bazel from adding the generated file to srcs?

knz avatar Sep 08 '22 14:09 knz

Do you know how I can prevent dev generate bazel from adding the generated file to srcs?

Yes -- add a comment to BUILD.bazel (probably the one in your package subdirectory) like # gazelle:exclude token_names_test.go.

rickystewart avatar Sep 08 '22 16:09 rickystewart

thank you!

knz avatar Sep 08 '22 16:09 knz

@rickystewart hi, for when you have time (NOT URGENT), this PR could use some help:

There is a new generated file in sql/scanner (token_names_test.go). The linter is now complaining that it cannot build/lint the code because this generated file is missing.

How can I add the generation of this file to the dependencies for the linter?

knz avatar Sep 12 '22 10:09 knz

Oh nvm, found it -- it's because I forgot to update the Makefile.

knz avatar Sep 12 '22 10:09 knz

This is now rebased, with more tests added, ready for review.

knz avatar Dec 04 '22 16:12 knz

TFYRs!

bors r=rytaft,ZhouXing19

knz avatar Dec 06 '22 21:12 knz

Build failed (retrying...):

craig[bot] avatar Dec 06 '22 21:12 craig[bot]

Build succeeded:

craig[bot] avatar Dec 06 '22 22:12 craig[bot]