crate Add a scalar function that takes a unfinished SQL statements and returns completion suggestions

Use case

Extending both the AdminUI and crash with semantic aware auto-completion, which uses a server to suggest schemas, tables, functions and keywords.

This could be done on the client side, but doing it in CrateDB itself has the advantage that not every client needs to re-implement the logic.

Feature description

For editors there is the Language server protocol that specifies a protocol for how a server can communicate with a client to provide rich features like autocomplete, goto definition and so on.

But for the use case described above adding a LSP implementation into CrateDB seems overkill. For crash or AdminUI editing session there is no workspace with files and so on.

An alternative could be a scalar function that takes a statement, line number and column number and returns rows with completion candidates.

The line number and column number would be for cases where a user modifies a statement:

SELECT 
  foo,
  bar.|
FROM
  tbl

| indicates the cursor position.

First step

This is what the estimate is about.

[ ] Next keywords based on ANTLR4 grammar / tokenizer

Follow ups

[ ] Schema names
[ ] Table names
[ ] Column names
[ ] Scalar functions
[ ] Available settings in different contexts. (SET, WITH clauses)

Resources

https://github.com/oranoran/antlr4-autosuggest
https://blog.rapid7.com/2015/06/29/how-to-implement-antlr4-autocomplete/
https://tomassetti.me/autocompletion-editor-antlr/
https://dev.to/bytebase/how-we-built-the-sql-autocomplete-framework-with-antlr4-a3p

Mar 27 '20 14:03 mfussenegger

Would it (in terms of latency) not be better to do this client-side? Using ANTLR4 and having the grammar/tokenizer ready would be rather language agnostic anyway.

JS/TS wise modeDBA (https://modeldba.com/) has a sql-autocomplete ready, which might be able to extend/contribute to? https://www.npmjs.com/package/sql-autocomplete

Nov 20 '20 08:11 proddata

Would it (in terms of latency) not be better to do this client-side?

In practice it doesn't really matter if we can respond fast enough to the requests. It depends of course on where the server is you're communicating too - but if it is close enough users shouldn't feel any sluggishness. CrateDB itself should be able to respond within ~10ms to such requests.

The language server protocol uses a server-client model and works well in practice.

We also want to extend the functionality in the future to include suggestions for column names, table names, settings and so on. This would require a client to talk to the server anyhow.

Using ANTLR4 and having the grammar/tokenizer ready would be rather language agnostic anyway.

ANTLR4 doesn't have parser generators for all languages and we'd also have to re-implement lots of logic that goes into the completion itself like the error recovery.

Doing that once on the server side is less effort for us and enables re-using the functionality across more clients.

I'd also like to built this as a library that is fairly de-coupled from the remaining CrateDB code, so that we could later on even create a full blown language server out of the completion library, so that we'd get the functionality into editors like vscode or vim.

Nov 20 '20 08:11 mfussenegger