llama-node icon indicating copy to clipboard operation
llama-node copied to clipboard

Code only using 4 CPU, when I have 16 CPU

Open gaurav-cointab opened this issue 1 year ago • 3 comments

This is the code that I am using

import {RetrievalQAChain} from 'langchain/chains'; import {HNSWLib} from "langchain/vectorstores"; import {RecursiveCharacterTextSplitter} from 'langchain/text_splitter'; import {LLamaEmbeddings} from "llama-node/dist/extensions/langchain.js"; import {LLM} from "llama-node"; import {LLamaCpp} from "llama-node/dist/llm/llama-cpp.js"; import * as fs from 'fs'; import * as path from 'path';

const txtFilename = "TrainData"; const txtPath = ./${txtFilename}.txt; const VECTOR_STORE_PATH = ${txtFilename}.index; const model = path.resolve(process.cwd(), './h2ogptq-oasst1-512-30B.ggml.q5_1.bin'); const llama = new LLM(LLamaCpp); const config = { path: model, enableLogging: true, nCtx: 1024, nParts: -1, seed: 0, f16Kv: false, logitsAll: false, vocabOnly: false, useMlock: false, embedding: true, useMmap: true, }; var vectorStore; const run = async () => { await llama.load(config); if (fs.existsSync(VECTOR_STORE_PATH)) { console.log('Vector Exists..'); vectorStore = await HNSWLib.fromExistingIndex(VECTOR_STORE_PATH, new LLamaEmbeddings({maxConcurrency: 1}, llama)); } else { console.log('Creating Documents'); const text = fs.readFileSync(txtPath, 'utf8'); const textSplitter = new RecursiveCharacterTextSplitter({chunkSize: 1000}); const docs = await textSplitter.createDocuments([text]); console.log('Creating Vector'); vectorStore = await HNSWLib.fromDocuments(docs, new LLamaEmbeddings({maxConcurrency: 1}, llama)); await vectorStore.save(VECTOR_STORE_PATH); } console.log('Testing Vector via Similarity Search'); const resultOne = await vectorStore.similaritySearch("what is a template", 1); console.log(resultOne); console.log('Testing Vector via RetrievalQAChain'); const chain = RetrievalQAChain.fromLLM(llama, vectorStore.asRetriever()); const res = await chain.call({ query: "what is a template", }); console.log({res}); }; run();

It is only using 4 CPU at the time of "vectorStore = await HNSWLib.fromDocuments(docs, new LLamaEmbeddings({maxConcurrency: 1}, llama));"

Can we change anything for it to use more than 4 CPU?

gaurav-cointab avatar May 16 '23 18:05 gaurav-cointab

not yet. llama.cpp seems not supporting parallel inference at the moment. I may find another ways (like implement a round robin in rust level) for this.

hlhr202 avatar May 17 '23 09:05 hlhr202

@hlhr202 any updates on that?

pavelpiha avatar Jul 25 '23 14:07 pavelpiha

@hlhr202 u gotta hire us when you make it big :) LGTM

HolmesDomain avatar Jul 26 '23 20:07 HolmesDomain