Lukas Kreussel comments

Results 114 comments of


                                            Lukas Kreussel

LoRA swapping at runtime

@EricLBuehler Thanks for adding this. My main usecase of `mistral.rs` is using it as an async server alternative to `ollama` and i can only provide my opinions on the server...

Docker builds fail with "failed to read `/mistralrs/mistralrs-bench/Cargo.toml`"

I'll look into it, `mistralrs-core` now also seams to depend on pyo3 so i also have to add python to the builder containers.

Docker builds fail with "failed to read `/mistralrs/mistralrs-bench/Cargo.toml`"

@sammcj Yeah, the default entry point currently only sets the port and hf_token. Since there are a lot of options to load a model into the server the containers expect...

REQUEST: Add LIMA model to RustFormers

Correct me if im wrong, but from having a quick look at the paper. LIMA seems to be a different finetuning approach which doesnt modify the underlying model architecture. If...

Proper Rewind+Refeed when stop token is detected.

Maybe we could change the callback to work with the actual tokens instead of the decoded string, that should make detecting the correct stop sequence simpler or is there a...

Using gpu is slower as not using it

See https://github.com/rustformers/llm/pull/325. Cublas/Clblast acceleration isn't currently supported on the main branch, meaning you can build with the acceleration enabled but it wont accelerate the inference. Also only `llama` based models...

Using gpu is slower as not using it

Yeah i was planning to create a table in the "accelerators docu" which shows which architecture can be accelerated by which gpu backend, as it's likely that some models will...

AMD ROCm support with HIPBLAS

`rustformers` uses `llama.cpp` as it's ggml source, feel free to create an PR including this change, seams like you only need to adjust the `build.rs` of the `ggml-sys` create. I...

Slow inference time running on CPU

Just adding that i saw the exact same behaviour, with the cpu only image. The problem even seams to get worse if i try to pass in a batch of...

Slow inference time running on CPU

I tested TEI against both a f32 and a f16 model, f16 models seam to be a bit slower than their f32 counterpart but its not significant. TEI seams to...