llm
llm copied to clipboard
Convert `quantize.cpp` to Rust
Split this off from #21 as it's a separate issue.
This should be relatively straightforward - it reads in the original ggml
model, runs the quantization functions over the data, and writes it out to disk.
The exciting possibility is for parallelisation 👀 - all you should have to do is scan through the file to determine the tensor boundaries, then build an iterator from it and feed it to rayon
. It would be a huge improvement over the C++ version, and it would be practically free!
If theres nobody working on this I could tackle it in the week
Is there currently a way to convert models to ggml format? Im close to getting quantize into a working demo and was wondering if this should also be ported for the PR
Nope, that still requires the original Python code. If you want to tackle #21 as well, that would be awesome!
I have no problem working on it after I finish this issue.
Another question, how should the feature be used? Should it be another argument in the cli app?
Hm, just do the simplest possible thing for now and we'll figure out a new CLI. There's several changes landing to the CLI soon, so we should avoid doing anything complicated until it's entirely resolved.