llm icon indicating copy to clipboard operation
llm copied to clipboard

Convert `quantize.cpp` to Rust

Open philpax opened this issue 1 year ago • 4 comments

Split this off from #21 as it's a separate issue.

This should be relatively straightforward - it reads in the original ggml model, runs the quantization functions over the data, and writes it out to disk.

The exciting possibility is for parallelisation 👀 - all you should have to do is scan through the file to determine the tensor boundaries, then build an iterator from it and feed it to rayon. It would be a huge improvement over the C++ version, and it would be practically free!

philpax avatar Mar 18 '23 14:03 philpax

If theres nobody working on this I could tackle it in the week

FloppyDisck avatar Mar 19 '23 06:03 FloppyDisck

Is there currently a way to convert models to ggml format? Im close to getting quantize into a working demo and was wondering if this should also be ported for the PR

FloppyDisck avatar Mar 25 '23 05:03 FloppyDisck

Nope, that still requires the original Python code. If you want to tackle #21 as well, that would be awesome!

philpax avatar Mar 25 '23 11:03 philpax

I have no problem working on it after I finish this issue.

Another question, how should the feature be used? Should it be another argument in the cli app?

FloppyDisck avatar Mar 25 '23 17:03 FloppyDisck

Hm, just do the simplest possible thing for now and we'll figure out a new CLI. There's several changes landing to the CLI soon, so we should avoid doing anything complicated until it's entirely resolved.

philpax avatar Mar 25 '23 20:03 philpax