Lukas Kreussel
Lukas Kreussel
I don't think i understand your question completely, but if you want to use an external tokenizer you can simply provide the path or model-name to the tokenizer you want...
Could you try another quantization format? Maybe q5_1 or one of the K-quants?
The error seams to be caused by [this](https://github.com/ggerganov/llama.cpp/blob/b7647436ccc80970b44a270f70f4f2ea139054d1/ggml-metal.m#L758-L774) codeblock in the ggml metal shader implementation. We probably have to pull the latest changes into our repo or we have to...
According to https://github.com/ggerganov/llama.cpp/issues/2508 some quantizatio90ns are simply not implemented in metal.
The logging is generated from the ggml side and there is currently no way to disable it, with the upcoming ggml update it should be gone but it's currently unstable...
No it isn't yet. We would need to port the bigcode example over from the ggml Repo. But currently we are working on getting gpu support for all models, which...
@jondot Theoretically you should be able to run it with the `gpt2` architecture, but i haven't tested that yet. If you want give it a try and let me know...
Probably another issue with the currently used ggml version, a re-sync with the current main branch of `llama.cpp` is probably needed.
After playing around with gpu acceleration i believe that the inference code of these models has some errors and accesses uninitialized memory somewhere meaning the results are a bit corrupted...
We could pull the model downloader from the test package into the cli package and enable loading models from an url. The we just need a test harness for the...