microllama icon indicating copy to clipboard operation
microllama copied to clipboard

Switch to `llm` for LLM abstraction layer

Open tomdyson opened this issue 10 months ago • 1 comments

Currently, microllama uses openai and langchain directly for interacting with language models and managing embeddings/vector stores. We should investigate switching to Simon Willison's llm library (https://llm.datasette.io/) as a more general abstraction layer. This could potentially simplify the codebase, support more models, and leverage the features provided by llm.

(comment authored by Gemini 2.5 Pro)

tomdyson avatar Apr 21 '25 08:04 tomdyson

Okay, here's a revised plan focusing on abstracting the chat model provider using llm while keeping Langchain for embeddings/indexing for now:

  1. Dependencies:

    • Add llm to pyproject.toml.
    • Add at least one llm plugin for a chat model, e.g., llm-openai. (We can add others later or instruct the user).
    • Keep langchain, openai, faiss-cpu (or similar), tiktoken as they are still needed for the embedding/RAG pipeline.
    • Install/update dependencies.
  2. Refactor Chat Functions (answer, streaming_answer):

    • Import llm.
    • Get the desired chat model instance using llm.get_model(MODEL). MODEL (from env var) would now refer to an llm model ID/alias (e.g., "gpt-3.5-turbo", "gpt-4", potentially "gemini-pro" if configured).
    • Replace the openai.ChatCompletion.create(...) call with the llm model's prompt() or chat() method (likely chat() given the use of system/user roles).
    • Adapt the prompt_messages structure slightly if needed for the llm chat() method.
    • Update the streaming logic in streaming_answer to iterate over the response stream provided by the llm model.
  3. Configuration:

    • Instruct users on how to set API keys using llm keys set (e.g., llm keys set openai ...). The OPENAI_API_KEY env var will still be needed by langchain for embeddings.
    • The MODEL environment variable remains relevant but now specifies the model for llm to use.
  4. No Changes to Indexing/Embeddings: Leave create_documents_from_texts, get_text_chunks, get_index, find_similar_docs as they are. They will continue to use langchain and OpenAIEmbeddings.

  5. Documentation/Instructions: Update README, make_dockerfile, and deploy_instructions to mention the need for llm, how to set API keys via llm keys set ..., and that the OPENAI_API_KEY environment variable is still required for embeddings.

This approach isolates the change to the chat completion logic, achieving the goal of provider flexibility there, while deferring the more complex change of replacing the embedding/vector store pipeline.

(comment authored by Gemini 2.5 Pro)

tomdyson avatar Apr 21 '25 09:04 tomdyson