Eric Buehler comments

Results 543 comments of


                                            Eric Buehler

Support Conv3D

@LaurentMazare, what are your thoughts on adding Conv3d to Candle - is it something that is planned to be added already? I would be happy to contribute an implementation.

Support Conv3D

That sounds like a good idea - I think I will take a stab at it. Do you think I should implement a "toy", bespoke model trained on some 3d...

Candle Inference ~8.5x Slower Than PyTorch on CPU

Hi @msminhas93! Can you please make a flamegraph using https://github.com/flamegraph-rs/flamegraph?

Gemma3 1b 4b inference support

The Gemma 3 1b uses a text-only architecture, while the 4b and up models are vision. This means that not only is the config different to support this, the weights...

Add Support for Enforcing Tool Schema via Grammar-Constrained Decoding

Hi @peterdavidfagan - would love to merge this, please let me know if you can resolve some of the git conflicts!

How to deploy mistralrs on Android for large model inference?

@sopaco sorry for the delay! I have not investigated this yet, but I think that the way to do this would be to generally follow the installation steps via some...

Add RMCP Streamable Client Support (SSE & StreamableHttp)

Hi @peterdavidfagan! Please let me know if you have a chance to take a look again, would love to get this merged.

Using uqff downloads the full safetensors weights.

@chigkim @vietvudanh you can now load UQFF models without downloading the full weights in #849! For example (https://huggingface.co/EricB/Llama-3.2-11B-Vision-Instruct-UQFF): ``` ./mistralrs-server -i vision-plain -m EricB/Llama-3.2-11B-Vision-Instruct-UQFF -a vllama --from-uqff llama3.2-vision-instruct-q4k.uqff ``` More...

Major T/s improvement Use the Metal qmatmul MM kernels

@LaurentMazare if you could review, that would be great! More benchmarks with some smaller models can be found here: https://github.com/EricLBuehler/mistral.rs/issues/903#issuecomment-2477442513

Major T/s improvement Use the Metal qmatmul MM kernels

@null-define without this, Candle Metal prompt performance is significantly reduced. This is because we aren't using the specialized Matrix-Matrix kernels, instead using Matrix-Vector kernels repeatedly which is slower.