obsidian-Smart2Brain Too slow within obsidian

What happened?

I tried using llama 3 and phi-3. Performance is good for both of these models in Jan UI and ollama. However, when using within obsidian, it takes 3-4 minutes to retrieve with 0% creativity and 20% similarity.

Error Statement

No response

Steps to Reproduce

Change provider to llama-3/phi-3
open chat window
Compare inference time with Jan UI/ollama

Smart Second Brain Version

1.0.2

Debug Info

SYSTEM INFO: Obsidian version: v1.5.12 Installer version: v1.4.16 Operating system: Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:34 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T8103 23.0.0 Login status: not logged in Insider build toggle: off Live preview: on Base theme: dark Community theme: Atom v0.0.0 Snippets enabled: 1 Restricted mode: off Plugins installed: 4 Plugins enabled: 3 1: Supercharged Links v0.12.1 2: Style Settings v1.0.8 3: Smart Second Brain v1.0.2

Apr 24 '24 17:04 SmokeShine

Sorry the inference is taking so long. I could not figure this out, does Jan.ai use ollama to run the models? Or are you talking about two different cases?

Apr 25 '24 12:04 nicobrauchtgit

Sorry the inference is taking so long. I could not figure this out, does Jan.ai use ollama to run the models? Or are you talking about two different cases?

No. Jan can pull llama 3 on its own. I used it to double check if there was an issue in my hardware.

I am using ollama for the plugin.

Apr 25 '24 18:04 SmokeShine

The lower the similarity score the more notes are retrieved. If these notes are bigger than the LLM max context size we summarize them hierarchically to make them fit which can take some time as described here.

May 03 '24 11:05 Leo310

But this will always bottleneck when you have the most requirement for using the assistant. If I have 10 notes, I can manage them without the plugin. If I increase the context length of the base model, I cannot fit it in VRAM.

May 03 '24 16:05 SmokeShine

@SmokeShine, that's not quite what @Leo310 meant... You can index thousands of notes, but when you actually run a query (Smart Second Brain chat with RAG enabled), it has to find appropriate notes and fit them in the context window of the model... Or else it needs to summarize them (which takes time). So, you'll get a faster response, by increasing the "similarity" slider, so that fewer notes are retrieved for the RAG query.

Jun 18 '24 00:06 zeigerpuppy

obsidian-Smart2Brain obsidian-Smart2Brain copied to clipboard

Too slow within obsidian

What happened?

Error Statement

Steps to Reproduce

Smart Second Brain Version

Debug Info

obsidian-Smart2Brain
obsidian-Smart2Brain copied to clipboard