livebook icon indicating copy to clipboard operation
livebook copied to clipboard

[WIP] Elixir code completion

Open jonastemplestein opened this issue 1 year ago • 6 comments

The aim of this PR is to eventually offer Elixir inline code completion within Livebook.

The high level design is like this

  • When users stop typing in a code cell, they will see a ghost text suggesting what they might want to type next. Hitting tab inserts that text
  • We will train a code model (using the python ecosystem)
  • We will run inference in livebook using either bumblebee (in case of beefy GPU with large memory) or otherwise a llama.cpp NIF (in which case we could use quantised models and CPU inference)
  • Users will be able to select which model to run and livebook downloads it for them

For more context on the project, see this this random document with (slightly outdated) notes.

Status

This is a very minimal implementation of copilot style code completion.

At the moment the only LLM supported is the GPT4 API. Set OPENAI_API_KEY env var to play around with it.

Inline completion should appear 500ms after you stop typing. Or use Ctrl + Space to force inline completion to appear.

TODO in Livebook

Frontend polish

  • [ ] Don't debounce completion when keyboard shortcut is used
  • [ ] Implement stop words logic like Tabby does
  • [ ] Better deal with line-breaks (e.g. when the infilled code is meant to start with a newline when the cursor is at the end of a comment line)
  • [ ] Don't show completion in certain situations (e.g. empty editor, cursor at the beginning of non-empty line, etc)
  • [ ] Completion currently doesn't show when there is an intellisense suggestion. I'd try to make these independent and have [tab] always do code completion and [return] always accept the intellisense suggestion (like the cursor editor does it)

Livebook plumbing

  • [ ] Allow user to select model and download that model
  • [x] Communicate what's going on with user while model is a) being downloaded and b) being loaded (and error states)
  • [ ] Give context from the whole livebook (not just current cell) to the LLM
  • [ ] Completion cache to avoid hitting the LLM unnecessarily

Model inference

  • [x] Llama.cpp HTTP API for rapid prototyping using local llama.cpp
  • [ ] Llama.cpp nif for inference
  • [x] Bumblebee model implementation

Tests!

TODO for fine-tuning a model

The hardest task is to actually fine-tune a model

  • [x] Acquire large amounts of elixir code (and remove PII etc - could perhaps use The Stack and the source code of elixir core and a few other open source projects)
  • [ ] Turn the code into fill-in-the-middle training examples (can use GPT4 to e.g. generate comments to be used in those examples)
  • [ ] Fine tune a bunch of different models to see which one performs best (be mindful they each have different infilling formats - this document has a list of LLMs to evaluate
  • [ ] Create mechanism for evaluating fill in the model performance
  • [ ] Implement bumblebee model loaders for most promising models
  • [ ] Produce model files to be used by livebook - both for bumblebee (HF transformers format) and quantised GGUF

One of the most fiddly bits seems to be to properly tokenise the special infilling tokens (both in bumblebee and llama.cpp). This seems a bit fiddly and the models often output garbage if you get this wrong. There is some good context on these llama.cpp threads [1] [2]

jonastemplestein avatar Nov 09 '23 15:11 jonastemplestein

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 09 '23 15:11 CLAassistant

Ah sorry, I meant to open this under my own fork. Shall I move it there?

jonastemplestein avatar Nov 09 '23 15:11 jonastemplestein

Feel free to leave it here for people to play with :)

josevalim avatar Nov 09 '23 16:11 josevalim

Just to give a little update on this:

  • I think we will most likely want to use a fine-tune of bumblebee-1.3b (and maybe 6.7b for beefier machines)
  • I got a bit stuck last week trying to fine-tune the model but learned a lot about how the models actually work

Will hopefully have a model that is demonstrably better than bumblebee-1.3b by the end of the week

jonastemplestein avatar Nov 27 '23 08:11 jonastemplestein

Bumblebee or deepseekr? :)

josevalim avatar Nov 27 '23 09:11 josevalim

I would deeply love if when i added a @doc above a function definition, it would autocomplete ;

@doc ~S"""
  Describe this function here

  ## Examples

      iex> ThisModule.Thisfunction("CREATE party\n")
      {:ok, {:create, "party"}}

  

"""

I forget the indentation and key words for doctests to work. They are super useful in livebook.

nickkaltner avatar May 25 '24 12:05 nickkaltner