Why do similarly locally deployed large models run slowly in Copilot?
I installed Ollama and downloaded various models ranging from 8B to 70B, intending to use the Copilot plugin in my Obsidian notes to access local models for note content.
The actual performance was quite poor. Even the 8B Qwen and DeepSeek models were significantly slower compared to the same settings in CherryStudio. Moreover, they often provided irrelevant answers or even got stuck in infinite loops with the same words (this was common with qwen3-vl, while DeepSeek rarely exhibited this issue. The 70B Qwen and DeepSeek models, on the other hand, were concise and on-topic).
I also used the bge-3 model to vectorize my notes, but the results were unsatisfactory. In contrast, the same models in CherryStudio performed normally.
I don’t know why this happened. AI suggested it might be due to plugin conflicts, but I couldn’t find any related issues.
我安装了ollama,下载了从8b到70b的各种模型,本想在我的obsidian笔记里应用copilot插件,来调用本地模型访问笔记内容。
实际运行效果很差,即使是8b的qwen和DeepSeek模型,都比在cherrystudio里,同样的设置要慢很多、很多。而且常常答非所问,甚至同样的词语无限循环状态(qwen3-vl 很常见,DeepSeek 很少出现。70B 的Qwen、DeepSeek模型则很简洁、贴题)。
我用了 bge-3 模型来向量化笔记,效果也差强人意。而作为对比的 cherrystudio里同样的模型,表现则很正常。
我不知道,这是为什么。AI告诉我说,可能是插件冲突所致,我也没有发现相关问题。
Switched to a different model, gpt-oss:20b. The first few responses were normal, but then it started acting like this.
I will try to repro on my side