llm Does it support the new GGMLv3 quantization methods?

Does it support the new GGMLv3 quantization methods?

Open Exotik850 opened this issue 1 year ago • 5 comments

Tried using the cli application to see how far it had come from being llama-rs, and noticed that an error popped up using one of the newer WizardLM uncensored models using the GGMLv3 method,

llm llama chat --model-path .\Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1.bin
⣾ Loading model...Error:
   0: Could not load model
   1: invalid file format version 3

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Am I using it the wrong way or is it not supported yet?

May 29 '23 18:05 Exotik850

Hi there! Yes, it's supported, but only on the latest version (main) - we haven't cut a new release yet. Hope to have that sorted soon!

May 29 '23 19:05 philpax

My apologies, should've tried the main branch instead of just trying the release 😅

May 30 '23 15:05 Exotik850

No worries - I'll keep this up for now and pin it for people's reference until we get it out the door :)

May 31 '23 18:05 philpax

@philpax have you considered making some 0.2.0-beta.1 etc. releases on crates.io? This pattern has worked very well for some of my own projects in the past.

Aug 19 '23 01:08 arctic-hen7

Hi there! Yeah, I've considered it, but the main blocker is https://github.com/rustformers/llm/issues/221 - I don't want to cut a release where the interface is going to be radically different in the next release. I'm hoping to have this all closed out within the next week or two, especially with GGUF on the horizon, but I've been quite busy.

Aug 21 '23 07:08 philpax

llm llm copied to clipboard

Does it support the new GGMLv3 quantization methods?

llm
llm copied to clipboard