Lukas Kreussel comments

Results 114 comments of


                                            Lukas Kreussel

Builder API and further removal of ONNX from the IR

>If I were you, I would write a script that packs the GGML weights. This would pack 2 Float 16 values into a single u32 value. You could then use...

Builder API and further removal of ONNX from the IR

> Not sure what you mean by this - as far as I know there are no special provisions for unified memory in WebGPU/wgpu but I might be mistaken. I...

Inference without ONNX / usage of WONNX as backend for LLMs

@pixelspark Here are some converted [repajama models](https://huggingface.co/LLukas22/redpajama-ggml) which should work with the latest `main` branch. (I havent created the readme yet). I can also recommend, [MPT](https://huggingface.co/LLukas22/mpt-7b-ggml) based models which are...

Inference without ONNX / usage of WONNX as backend for LLMs

Sorry, the Repository was still private. But i would still recommend to use MPT as some GptNeoX based models (Including Redpajama) have problems with added BOS tokens. (see https://github.com/rustformers/llm/pull/270)

Inference without ONNX / usage of WONNX as backend for LLMs

@pixelspark GGML recently updated their quantization format (see https://github.com/ggerganov/llama.cpp/pull/1508). Yesterday these changes were merged into `llm`. This means all quantized models (marked with qX_Y) need to be reconverted. Currently im...

Inference without ONNX / usage of WONNX as backend for LLMs

Alright the models are converted and uploaded. I also added Pythia models, which are smaller GptNeoX models we could use for development. RedPajama: https://huggingface.co/Rustformers/redpajama-ggml Pythia: https://huggingface.co/Rustformers/pythia-ggml

Inference without ONNX / usage of WONNX as backend for LLMs

Hm strange, i'm using the exact same model and same git revision and its working as expected (First few tokens are garbled for redpajama, because of the BOS issue). Maybe...

Inference without ONNX / usage of WONNX as backend for LLMs

As previously mentioned the strange results from Redpajama are expected as the CLI uses the wrong BOS token atm. Tbh i dont exactly know what the `chat` feature in the...

Model Wishlist

@NeroHin > IBM's Granite series Code Models. > > [Granite Code Models](https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330) The `3b` and `8b` variants should already be supported as they are just based on the `llama` architecture....

Support for BAAI/bge-m3 model

Generating normal `dense` embeddings works fine because `bge-m3` is just a regular `XLM-Roberta` model. The problem is there's no way to use the `sparse` or `colbert` features of this model...