jan
jan copied to clipboard
epic: Jan has Conversation-based RAG
Objectives
- Allow users to upload text files (PDFs, docs, etc.) within the chat interface.
- Deliver a user-friendly file upload mechanism, enabling seamless integration into the chat environment.
Leads
- Product: @imtuyethan
- Engineering: @hiro-v
User Stories
In Scope
-
As a User, I want to upload text files to the chat:
- Scenario: When I have a text-based file (PDF, doc, etc.), I can upload it directly within the chat interface.
- Acceptance Criteria: I should be able to select a file from my device and upload it to the chat window.
-
As a User, I want to view the uploaded file's content:
- Scenario: Upon uploading a text file, I want to view its content displayed within the chat.
- Acceptance Criteria: The chat interface should visually represent the uploaded file's content, making it accessible alongside the conversation.
-
As a User, I want to ask questions related to the uploaded file:
- Scenario: After uploading a file, I want to input prompts or questions about its content within the chat.
- Acceptance Criteria: The chat interface should allow me to type prompts or questions, linking them contextually to the specific file content for relevant responses.
-
As a User, I want to receive responses based on file-specific queries:
- Scenario: When I input queries related to the uploaded file, I expect relevant and contextual responses within the chat.
- Acceptance Criteria: The system should process my queries about the uploaded file, providing accurate and appropriate responses in the conversation thread.
-
As a User, I understand the limitation of multiple file uploads:
- Scenario: When attempting to upload multiple files simultaneously, I should receive information about this limitation.
- Acceptance Criteria: The system should display notifications or error messages, informing me that only one file can be uploaded at a time.
Out-of-Scope
- As a user, I want to see prompts suggestions based on the capabilities of the model.
- As a user, I want to attach many files at the same time.
- As a user, I want to attach many other file formats like audio,...
Design Wireframes
Figma link: https://www.figma.com/file/ytn1nRZ17FUmJHTlhmZB9f/Jan-App?type=design&node-id=783-43738&mode=design&t=7KYGjHy7F1RvqEip-4
Engineering & Architecture
In Scope
- List of supported sources: Parsable binary file with extensions:
.pdf
,.docx
Out-of-Scope
- List of unsupported files: others
Tasklist
- [x] #1126
- [x] #1335 @urmauur
- [x] Assistant using event base communication (
Event.on
andEvent.emit
) - [x] Message request refactoring (wrong format, OpenAI does not work with request body) - @louis-jan
- [x] Fix thread history (could not retrieve messages somehow) - @louis-jan
- [x] Query ingested documents from inference extensions (currently hard coded the response from assistant, no ingestion) - @louis-jan
- [ ] Add indices for new documents if the memory folder exists - @hiro-v
- [x] Add
retrieval
settings in UI for user to change
Resources
https://www.chatpdf.com/c/vzHhtas3uQVZDK9ZGglaw
Out of scope
Archive @dan-jan's original comment because i need to input my product specs on top for subtask to work on github:
Spec
- WIP
- Conversation-based RAG
- Parity with GPT4, where they can upload a PDF and ask questions
Appendix
Why not /files
approach?
- Jan can change models
- We would need to reindex embeddings every time there is a model switch?
Storage: /threads /thread-1 /files > my.pdf /thread-2 Future iteration: we symlink so files are not duped. :plus:add a design that shows when models do not support RAG
Right panel: In Assistant section Under category called "Tools" Have a checked in [x] checkbox for File Retrieval In the future this section will have web search and other tools
Similar to OpenAi's createGPT flow
Eng spec
There are 3 scenarios:
- Chat with assistant, toggle for
tools/retrieval
off => Chat normally. - Chat with assistant, toggle for
tools/retrieval
on and upload file as reference (parsable PDF atm) => Ingestion phase - Chat with assistant, toggle for
tools/retrieval
on and query => Query phase
Communication layers:
-
Model
(nitro extension/ openai extension) currently uses event based processing:on
/emit
- Assistant refactored from function-based to event based processing:
on
/emit
as well - Tool retrieval designed in a way that:
- It can be toggled on/ off and acts as middleware.
- Tool can have node/ browser runtime. For retrieval, it's node runtime
extensions/assistant-extension/src
├── @types
│ └── global.d.ts
├── index.ts
└── node
├── index.ts
└── tools
└── retrieval
└── index.ts
- Tool/ retrieval can accept multiple input format (web content for web browsing/ multiple files format)
- Tool/ retrieval settings can be configured on right hand side, thread-based settings. The global one is stored in `jan/assistants/jan/assistant.json` as follow:
{
"avatar": "",
"id": "jan",
"object": "assistant",
"created_at": 1705549969445,
"name": "Jan",
"description": "A default assistant that can use all downloaded models",
"model": "*",
"instructions": "",
"tools": [
{
"type": "retrieval",
"enabled": true,
"settings": {}
}
],
"file_ids": []
}
Tools of choice:
- VectorStore:
HNSW
binding in node - Embedding:
- Will be likely to use nitro served embedding instead for BGE/ sentencetransformer (BERT based models)
What to do next even after this
- Fine-tuned embedding models
Questions and Answers:
-
Where are we being opinionated about and WHY, i.e. our choice of hsnw, langchain, no llamaindex
=> Answer:
- Using
langchain
andllama_index
at the moment is an opinionated choice that Hiro made because of:- We are not dependent on any existing libs, but we need an abstraction layer for
vdb
,pre-processing steps
(e.g Text splitter) that we do not want to re-invent the wheel -
langchain.js
is more actively developed thanllama_index TS
at the time we are developing.
- We are not dependent on any existing libs, but we need an abstraction layer for
- Choosing
hsnw
is an opinionated option too as it's the most lightweight and highly compatible version that can be embedded in any OS/ CPU of choices
- What abstractions need to happen in the future, to allow for a bring ur own vdb situation => Answer:
- Hiro think
yes, absolutely
, that's why I choose to uselangchain.js
to abstract the interface.
-
Any "hacky" solutions employed to get things to work for now => Answer
-
Impact on user disk / Jan Folder / resource hogging => Answer:
- The files save in `jan/threads/<threads_id>/<message_id>.extension
- The memory as files saved in `jan/threads/<threads_id>/memory/** (this on is packageable)
- Once the memory is there, newly ingested file will be appended.
-
Where are eng specs? https://github.com/janhq/jan/issues/1076#issuecomment-1899553830
-
Will it be available via the local api server? => Answer: Yes, but this one we have not thought through at the moment. However, this one will be designed similar to
OpenAI GPTs runs
-
How are we chunking?
- There are 2 parameters in
TextSplitter
withtext only
isChunking
andOverlap
. We set it by default at a fixed number but will let user to configure in the settings (thread level)
- How is llm map-reducing across similar vectors? Is that configurable by the user? => Answer
-
text
->embedding
->similaritySearch
(top-k) -> rerank. - User can configure this settings
- To be updated
- If the user uses a different embeddings layer (model A) for doc ingestion vs user queries (model B); our current approach seems hyper opinionated.
- There are 2 models: Embedding for
retrieval
, and LLM fortext-generation
. - Can use LLM for retrieval task as well.
- For the current implementation that nitro served LLM plays both roles, then if user change the model, they need to ingest again. One way to avoid this is to disable
Changing model in mid-thread
- The
likely way
is to split these models into 2 models, in which the embedding model does not always change. 1 way is adding https://github.com/FFengIll/embedding.cpp for servingsentence transformer
,bge
or even write it as node-gyp to use inside Jan alone.
@louis-jan point on framework layer:
- Extension is not lightweight anymore (now assistant extension
.tar.gz
is 100MB) => Should be reused. ===>retrieval extension
TODO:
- Very clear engineering specs - @hiro-v (+ @louis-jan @dan-jan )
@alan
- RAG into 2 steps:
--- Something just work at the moment --- Improved version