obsidian-Smart2Brain
obsidian-Smart2Brain copied to clipboard
Too slow within obsidian
What happened?
I tried using llama 3 and phi-3. Performance is good for both of these models in Jan UI and ollama. However, when using within obsidian, it takes 3-4 minutes to retrieve with 0% creativity and 20% similarity.
Error Statement
No response
Steps to Reproduce
- Change provider to llama-3/phi-3
- open chat window
- Compare inference time with Jan UI/ollama
Smart Second Brain Version
1.0.2
Debug Info
SYSTEM INFO: Obsidian version: v1.5.12 Installer version: v1.4.16 Operating system: Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:34 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T8103 23.0.0 Login status: not logged in Insider build toggle: off Live preview: on Base theme: dark Community theme: Atom v0.0.0 Snippets enabled: 1 Restricted mode: off Plugins installed: 4 Plugins enabled: 3 1: Supercharged Links v0.12.1 2: Style Settings v1.0.8 3: Smart Second Brain v1.0.2
Sorry the inference is taking so long. I could not figure this out, does Jan.ai use ollama to run the models? Or are you talking about two different cases?
Sorry the inference is taking so long. I could not figure this out, does Jan.ai use ollama to run the models? Or are you talking about two different cases?
No. Jan can pull llama 3 on its own. I used it to double check if there was an issue in my hardware.
I am using ollama for the plugin.
The lower the similarity score the more notes are retrieved. If these notes are bigger than the LLM max context size we summarize them hierarchically to make them fit which can take some time as described here.
But this will always bottleneck when you have the most requirement for using the assistant. If I have 10 notes, I can manage them without the plugin. If I increase the context length of the base model, I cannot fit it in VRAM.
@SmokeShine, that's not quite what @Leo310 meant... You can index thousands of notes, but when you actually run a query (Smart Second Brain chat with RAG enabled), it has to find appropriate notes and fit them in the context window of the model... Or else it needs to summarize them (which takes time). So, you'll get a faster response, by increasing the "similarity" slider, so that fewer notes are retrieved for the RAG query.