jan icon indicating copy to clipboard operation
jan copied to clipboard

epic: Jan has Conversation-based RAG

Open dan-menlo opened this issue 1 year ago • 6 comments

Objectives

  • Allow users to upload text files (PDFs, docs, etc.) within the chat interface.
  • Deliver a user-friendly file upload mechanism, enabling seamless integration into the chat environment.

Leads

  • Product: @imtuyethan
  • Engineering: @hiro-v

User Stories

In Scope

  1. As a User, I want to upload text files to the chat:

    • Scenario: When I have a text-based file (PDF, doc, etc.), I can upload it directly within the chat interface.
    • Acceptance Criteria: I should be able to select a file from my device and upload it to the chat window.
  2. As a User, I want to view the uploaded file's content:

    • Scenario: Upon uploading a text file, I want to view its content displayed within the chat.
    • Acceptance Criteria: The chat interface should visually represent the uploaded file's content, making it accessible alongside the conversation.
  3. As a User, I want to ask questions related to the uploaded file:

    • Scenario: After uploading a file, I want to input prompts or questions about its content within the chat.
    • Acceptance Criteria: The chat interface should allow me to type prompts or questions, linking them contextually to the specific file content for relevant responses.
  4. As a User, I want to receive responses based on file-specific queries:

    • Scenario: When I input queries related to the uploaded file, I expect relevant and contextual responses within the chat.
    • Acceptance Criteria: The system should process my queries about the uploaded file, providing accurate and appropriate responses in the conversation thread.
  5. As a User, I understand the limitation of multiple file uploads:

    • Scenario: When attempting to upload multiple files simultaneously, I should receive information about this limitation.
    • Acceptance Criteria: The system should display notifications or error messages, informing me that only one file can be uploaded at a time.

Out-of-Scope

  • As a user, I want to see prompts suggestions based on the capabilities of the model.
  • As a user, I want to attach many files at the same time.
  • As a user, I want to attach many other file formats like audio,...

Design Wireframes

Figma link: https://www.figma.com/file/ytn1nRZ17FUmJHTlhmZB9f/Jan-App?type=design&node-id=783-43738&mode=design&t=7KYGjHy7F1RvqEip-4

Engineering & Architecture

In Scope

  • List of supported sources: Parsable binary file with extensions: .pdf, .docx

Out-of-Scope

  • List of unsupported files: others

Tasklist

  • [x] #1126
  • [x] #1335 @urmauur
  • [x] Assistant using event base communication (Event.on and Event.emit)
  • [x] Message request refactoring (wrong format, OpenAI does not work with request body) - @louis-jan
  • [x] Fix thread history (could not retrieve messages somehow) - @louis-jan
  • [x] Query ingested documents from inference extensions (currently hard coded the response from assistant, no ingestion) - @louis-jan
  • [ ] Add indices for new documents if the memory folder exists - @hiro-v
  • [x] Add retrieval settings in UI for user to change

Resources

https://www.chatpdf.com/c/vzHhtas3uQVZDK9ZGglaw

Out of scope

dan-menlo avatar Dec 19 '23 02:12 dan-menlo

Archive @dan-jan's original comment because i need to input my product specs on top for subtask to work on github:

Spec

  • WIP
  • Conversation-based RAG
  • Parity with GPT4, where they can upload a PDF and ask questions

Appendix

Why not /files approach?

  • Jan can change models
  • We would need to reindex embeddings every time there is a model switch?

imtuyethan avatar Dec 20 '23 12:12 imtuyethan

Storage: /threads /thread-1 /files > my.pdf /thread-2 Future iteration: we symlink so files are not duped. :plus:add a design that shows when models do not support RAG

Right panel: In Assistant section Under category called "Tools" Have a checked in [x] checkbox for File Retrieval In the future this section will have web search and other tools

Similar to OpenAi's createGPT flow

imtuyethan avatar Dec 26 '23 07:12 imtuyethan

Eng spec

image

There are 3 scenarios:

  • Chat with assistant, toggle for tools/retrieval off => Chat normally.
  • Chat with assistant, toggle for tools/retrieval on and upload file as reference (parsable PDF atm) => Ingestion phase
  • Chat with assistant, toggle for tools/retrieval on and query => Query phase

Communication layers:

  • Model (nitro extension/ openai extension) currently uses event based processing: on/ emit
  • Assistant refactored from function-based to event based processing: on/ emit as well
  • Tool retrieval designed in a way that:
    • It can be toggled on/ off and acts as middleware.
    • Tool can have node/ browser runtime. For retrieval, it's node runtime
extensions/assistant-extension/src
├── @types
│   └── global.d.ts
├── index.ts
└── node
    ├── index.ts
    └── tools
        └── retrieval
            └── index.ts
   - Tool/ retrieval can accept multiple input format (web content for web browsing/ multiple files format)
   - Tool/ retrieval settings can be configured on right hand side, thread-based settings. The global one is stored in `jan/assistants/jan/assistant.json` as follow:
{
  "avatar": "",
  "id": "jan",
  "object": "assistant",
  "created_at": 1705549969445,
  "name": "Jan",
  "description": "A default assistant that can use all downloaded models",
  "model": "*",
  "instructions": "",
  "tools": [
    {
      "type": "retrieval",
      "enabled": true,
      "settings": {}
    }
  ],
  "file_ids": []
}

Tools of choice:

  • VectorStore: HNSW binding in node
  • Embedding:
    • Will be likely to use nitro served embedding instead for BGE/ sentencetransformer (BERT based models)

What to do next even after this

  • Fine-tuned embedding models

hiro-v avatar Jan 19 '24 02:01 hiro-v

Questions and Answers:

  1. Where are we being opinionated about and WHY, i.e. our choice of hsnw, langchain, no llamaindex => Answer:
  • Using langchain and llama_index at the moment is an opinionated choice that Hiro made because of:
    • We are not dependent on any existing libs, but we need an abstraction layer for vdb, pre-processing steps (e.g Text splitter) that we do not want to re-invent the wheel
    • langchain.js is more actively developed than llama_index TS at the time we are developing.
  • Choosing hsnw is an opinionated option too as it's the most lightweight and highly compatible version that can be embedded in any OS/ CPU of choices
  1. What abstractions need to happen in the future, to allow for a bring ur own vdb situation => Answer:
  • Hiro think yes, absolutely, that's why I choose to use langchain.js to abstract the interface.
  1. Any "hacky" solutions employed to get things to work for now => Answer

  2. Impact on user disk / Jan Folder / resource hogging => Answer:

  • The files save in `jan/threads/<threads_id>/<message_id>.extension
  • The memory as files saved in `jan/threads/<threads_id>/memory/** (this on is packageable)
  • Once the memory is there, newly ingested file will be appended.
  1. Where are eng specs? https://github.com/janhq/jan/issues/1076#issuecomment-1899553830

  2. Will it be available via the local api server? => Answer: Yes, but this one we have not thought through at the moment. However, this one will be designed similar to OpenAI GPTs runs

  3. How are we chunking?

  • There are 2 parameters in TextSplitter with text only is Chunking and Overlap. We set it by default at a fixed number but will let user to configure in the settings (thread level)
  1. How is llm map-reducing across similar vectors? Is that configurable by the user? => Answer
  • text -> embedding -> similaritySearch (top-k) -> rerank.
  • User can configure this settings
  • To be updated
  1. If the user uses a different embeddings layer (model A) for doc ingestion vs user queries (model B); our current approach seems hyper opinionated.
  • There are 2 models: Embedding for retrieval, and LLM for text-generation.
  • Can use LLM for retrieval task as well.
  • For the current implementation that nitro served LLM plays both roles, then if user change the model, they need to ingest again. One way to avoid this is to disable Changing model in mid-thread
  • The likely way is to split these models into 2 models, in which the embedding model does not always change. 1 way is adding https://github.com/FFengIll/embedding.cpp for serving sentence transformer, bge or even write it as node-gyp to use inside Jan alone.

hiro-v avatar Jan 23 '24 03:01 hiro-v

@louis-jan point on framework layer:

  • Extension is not lightweight anymore (now assistant extension .tar.gz is 100MB) => Should be reused. ===> retrieval extension

hiro-v avatar Jan 23 '24 03:01 hiro-v

TODO:

  • Very clear engineering specs - @hiro-v (+ @louis-jan @dan-jan )

@alan

  • RAG into 2 steps:

--- Something just work at the moment --- Improved version

hiro-v avatar Jan 23 '24 03:01 hiro-v