llama-node icon indicating copy to clipboard operation
llama-node copied to clipboard

Can't run example on llama-2-13b-chat q4_0

Open gioragutt opened this issue 10 months ago • 2 comments

I apologize in advance if I omit any useful details, I'm just a simple dev with no knowledge or understanding in DS and therefore I'm in trial and error land.

I followed the instructions from llama.cpp on the llama-2-13b-chat model, and I now have the q4_0 file: llama-2-13b-chat/ggml-model-q4_0.gguf.

I use the example code from this repo and of course have changed it to point to the model file, but loading fails:

The code:

import { LLM } from 'llama-node';
import { LLamaCpp } from 'llama-node/dist/llm/llama-cpp.js';
import path from 'path';

const model = path.resolve(
	process.cwd(),
	'../llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf',
);

console.log(model);

const llama = new LLM(LLamaCpp);
/** @type {import('llama-node/dist/llm/llama-cpp').LoadConfig} */
const config = {
	modelPath: model,
	enableLogging: true,
	nCtx: 1024,
	seed: 0,
	f16Kv: false,
	logitsAll: false,
	vocabOnly: false,
	useMlock: false,
	embedding: false,
	useMmap: true,
	nGpuLayers: 128,
};

const template = `How are you?`;
const prompt = `A chat between a user and an assistant.
USER: ${template}
ASSISTANT:`;

const params = {
	nThreads: 4,
	nTokPredict: 2048,
	topK: 40,
	topP: 0.1,
	temp: 0.2,
	repeatPenalty: 1,
	prompt,
};

const run = async () => {
	await llama.load(config);

	await llama.createCompletion(params, response => {
		process.stdout.write(response.token);
	});
};

run();

The error:

Debugger listening on ws://127.0.0.1:59899/c72280cb-a098-4c15-859f-54025e513896
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
/Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf
llama.cpp: loading model from /Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf
error loading model: unknown (magic, version) combination: 46554747, 00000001; is this really a GGML file?
llama_init_from_file: failed to load model
Waiting for the debugger to disconnect...
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: Failed to initialize LLama context from file: /Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf] {
  code: 'GenericFailure'
}

Node.js v18.17.1

I can see that the error refers to some constants which it doesn't expect in the file (error loading model: unknown (magic, version) combination: 46554747, 00000001; is this really a GGML file?), and I see that it's a gguf file and not a ggml one.

From a quick google search, I got to this post on r/LocalLLaMA which stats that gguf is sort of a successor to ggml.

I have literally 0 understanding of what I'm doing, and would appreciate if someone could point me in some direction of how to deal with it. Even just pointing out keywords I might have missed which could have led me to find a better answer in the first place 😅

Thanks in advance for your time!

gioragutt avatar Aug 26 '23 19:08 gioragutt

Exact same issue here. Did you manage to find a work around? I might be wrong, but it doesn't look like this library's llama-cpp has been updated in ~4 months. I wonder if that's the issue.

Noodle-Bug avatar Sep 04 '23 03:09 Noodle-Bug

is there a way to overlay a newest version of llama?

dseeker avatar Sep 22 '23 14:09 dseeker