Niq Dudfield

Results 309 comments of Niq Dudfield

Apparently the embeddings don't use the entire weights, so maybe there's a way. I'm very fuzzy on how those are created. I patched the proxy server to allow CORS, but...

``` const vectorStore = await MemoryVectorStore.fromDocuments( documents, new OllamaEmbeddings({ baseUrl: OLLAMA_BASE_URL, model: OLLAMA_MODEL, }), ); ``` I think fromDocuments/OllamaEmbeddings runs serially anyway so may need some wrapper beyond the server...

``` [GIN] 2024/01/31 - 10:15:31 | 200 | 1.460126417s | 127.0.0.1 | POST "/api/embeddings" 127.0.0.1 - - [31/Jan/2024 10:15:31] "POST /api/embeddings HTTP/1.1" 200 - 127.0.0.1 - - [31/Jan/2024 10:15:31] "POST...

Probably worth investigating yourself I really like the magic / "just works" aspect of using RAG/embeddings, but it would be nice if it was a bit faster /$somehow/

@andrewnguonly > If there's a comparable Wikipedia article, > how large is the content in your testing I was kind of using random pages > it seems to be constant...

I hacked the OllamaEmbeddings class (just the compiled code in node_modules): ```javascript async _embed(strings) { console.log('hack is working!!') const embeddings = []; for await (const prompt of strings) { const...

Maybe we should start a branch if want to look seriously, but anyway, here's some more of the artifacts of my investigations before: The ini file I used ``` [DefaultServer]...

> how large is the content in your testing Enough that there was a lot of embeddings requests anyway This was one of the pages: https://news.ycombinator.com/item?id=39197619 But I suspect it's...

I would try hacking a separate pool of servers just for the embeddings, with the proxy running on a non-default port. I'm still not sure WHEN the full model weights...

In any case, it didn't seem to help much in the big picture. Maybe you can tweak the threading settings for each ollama instance or something