jupyter-ai icon indicating copy to clipboard operation
jupyter-ai copied to clipboard

[v3-beta] Add glossary to developer docs

Open dlqqq opened this issue 11 months ago • 0 comments
trafficstars

Problem

It's unclear how local variables, functions, and classes should be named due to the lack of established & documented naming conventions. Some of the existing naming conventions were poorly chosen, and make it difficult to read existing code.

Although this issue may seem trivial, I believe that good, well-documented naming conventions can save days of effort for contributors when measured across years of development.

This issue serves two purposes:

  1. To track progress on adding contributor documentation regarding naming conventions in v3.
  2. To track proposals for new naming conventions in v3.

Contributors are absolutely welcome to offer feedback and contribute suggestions! Please leave them as comments here.

Proposed name changes

New term: chat model

In v2, "language model" generally referred to the model used in the chat. With the introduction of completion models, we need to reconsider the name "language model", as it's ambiguous whether the term refers to the LLM used in chat or the LLM used in completions.

For v3, we should prefer "chat model" as much as possible for the sake of clarity.

New terms: model names and model IDs

In v2, model IDs ambiguously refer to either the values used by Jupyter AI (e.g. openai-chat:gpt-4o) or the arguments accepted by a provider class (gpt-4o). Previously, to distinguish this, we referred to the former as global model IDs (abbreviated as gmid or gid), and the latter as local model IDs (abbreviated as lid or lmid).

  • Furthermore, in v2, to indicate that a model ID referred to a language model, variables were named lm_id, lm_lid, lm_gid. Similarly so for embedding models (em_id, em_gid, em_lid) and completion models (cm_id, cm_gid, cm_lid).

I have found this very confusing (even though I set these conventions). The v2 definitions produces 9 different ways to label model IDs.

For v3, I propose new definitions to eliminate ambiguity in the term "model ID":

  • Model name: the argument which identifies a model to a provider (e.g. gpt-4o)
  • Model ID: the argument which identifies a model to Jupyter AI (e.g. openai-chat:gpt-4o).
  • The definition of provider ID remains unchanged.

Local variables should be renamed accordingly:

  • lm_gid => chat_model_id
  • lm_lid => chat_model_name
  • em_gid => embedding_model_id
  • em_lid => embedding_model_name
  • etc.

dlqqq avatar Nov 27 '24 00:11 dlqqq