Lukas Kreussel comments

Results 114 comments of


                                            Lukas Kreussel

How do I use Huggingface tokenization to use a model on Huggingace in MODEL_PATH instead of my local machine?

I don't think i understand your question completely, but if you want to use an external tokenizer you can simply provide the path or model-name to the tokenizer you want...

WizardLM inference error: ggml-metal.m:773: false && "not implemented"

Could you try another quantization format? Maybe q5_1 or one of the K-quants?

WizardLM inference error: ggml-metal.m:773: false && "not implemented"

The error seams to be caused by [this](https://github.com/ggerganov/llama.cpp/blob/b7647436ccc80970b44a270f70f4f2ea139054d1/ggml-metal.m#L758-L774) codeblock in the ggml metal shader implementation. We probably have to pull the latest changes into our repo or we have to...

WizardLM inference error: ggml-metal.m:773: false && "not implemented"

According to https://github.com/ggerganov/llama.cpp/issues/2508 some quantizatio90ns are simply not implemented in metal.

How to disable ggml logging?

The logging is generated from the ggml side and there is currently no way to disable it, with the upcoming ggml update it should be gone but it's currently unstable...

Is HuggingFaceH4/starchat-beta supported?

No it isn't yet. We would need to port the bigcode example over from the ggml Repo. But currently we are working on getting gpu support for all models, which...

Is HuggingFaceH4/starchat-beta supported?

@jondot Theoretically you should be able to run it with the `gpt2` architecture, but i haven't tested that yet. If you want give it a try and let me know...

WizardCoder llama assert failure

Probably another issue with the currently used ggml version, a re-sync with the current main branch of `llama.cpp` is probably needed.

GPT-NeoX, GPT-J and BLOOM do not produce consistent results for the integration test

After playing around with gpu acceleration i believe that the inference code of these models has some errors and accesses uninitialized memory somewhere meaning the results are a bit corrupted...

Automatic CI testing of the `llm` CLI

We could pull the model downloader from the test package into the cli package and enable loading models from an url. The we just need a test harness for the...