Eric Buehler
Eric Buehler
@LaurentMazare, what are your thoughts on adding Conv3d to Candle - is it something that is planned to be added already? I would be happy to contribute an implementation.
That sounds like a good idea - I think I will take a stab at it. Do you think I should implement a "toy", bespoke model trained on some 3d...
Hi @msminhas93! Can you please make a flamegraph using https://github.com/flamegraph-rs/flamegraph?
The Gemma 3 1b uses a text-only architecture, while the 4b and up models are vision. This means that not only is the config different to support this, the weights...
Hi @peterdavidfagan - would love to merge this, please let me know if you can resolve some of the git conflicts!
@sopaco sorry for the delay! I have not investigated this yet, but I think that the way to do this would be to generally follow the installation steps via some...
Hi @peterdavidfagan! Please let me know if you have a chance to take a look again, would love to get this merged.
@chigkim @vietvudanh you can now load UQFF models without downloading the full weights in #849! For example (https://huggingface.co/EricB/Llama-3.2-11B-Vision-Instruct-UQFF): ``` ./mistralrs-server -i vision-plain -m EricB/Llama-3.2-11B-Vision-Instruct-UQFF -a vllama --from-uqff llama3.2-vision-instruct-q4k.uqff ``` More...
@LaurentMazare if you could review, that would be great! More benchmarks with some smaller models can be found here: https://github.com/EricLBuehler/mistral.rs/issues/903#issuecomment-2477442513
@null-define without this, Candle Metal prompt performance is significantly reduced. This is because we aren't using the specialized Matrix-Matrix kernels, instead using Matrix-Vector kernels repeatedly which is slower.