Lukas Kreussel

Results 114 comments of Lukas Kreussel

>If I were you, I would write a script that packs the GGML weights. This would pack 2 Float 16 values into a single u32 value. You could then use...

> Not sure what you mean by this - as far as I know there are no special provisions for unified memory in WebGPU/wgpu but I might be mistaken. I...

@pixelspark Here are some converted [repajama models](https://huggingface.co/LLukas22/redpajama-ggml) which should work with the latest `main` branch. (I havent created the readme yet). I can also recommend, [MPT](https://huggingface.co/LLukas22/mpt-7b-ggml) based models which are...

Sorry, the Repository was still private. But i would still recommend to use MPT as some GptNeoX based models (Including Redpajama) have problems with added BOS tokens. (see https://github.com/rustformers/llm/pull/270)

@pixelspark GGML recently updated their quantization format (see https://github.com/ggerganov/llama.cpp/pull/1508). Yesterday these changes were merged into `llm`. This means all quantized models (marked with qX_Y) need to be reconverted. Currently im...

Alright the models are converted and uploaded. I also added Pythia models, which are smaller GptNeoX models we could use for development. RedPajama: https://huggingface.co/Rustformers/redpajama-ggml Pythia: https://huggingface.co/Rustformers/pythia-ggml

Hm strange, i'm using the exact same model and same git revision and its working as expected (First few tokens are garbled for redpajama, because of the BOS issue). Maybe...

As previously mentioned the strange results from Redpajama are expected as the CLI uses the wrong BOS token atm. Tbh i dont exactly know what the `chat` feature in the...

@NeroHin > IBM's Granite series Code Models. > > [Granite Code Models](https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330) The `3b` and `8b` variants should already be supported as they are just based on the `llama` architecture....

Generating normal `dense` embeddings works fine because `bge-m3` is just a regular `XLM-Roberta` model. The problem is there's no way to use the `sparse` or `colbert` features of this model...