Eric Buehler comments

Results 543 comments of


                                            Eric Buehler

Loading `tokenizer.model` with Rust API

Thank you, @ArthurZucker for the link! I was actually able to get the GPT2 conversion to work now!

LoRA swapping at runtime

@BHX2, @LLukas22 I just merged #262! You can use per-request LoRA activation which in all APIs. After setting up your adapter model ordering file, you can try it out: [examples...

How to use `topk`?

@gregszumel, thanks for the explanation. I would love to see this added to Candle. If you want to contribute this to mistral.rs please feel free!

How to use `topk`?

@LaurentMazare, thanks! I saw that PR and am very excited for it to be merged.

How to use `topk`?

@gregszumel, that sounds great! If you decide to contribute it to mistral.rs, that would be much appreciated.

How to use `topk`?

For future reference, here's the implementation: https://github.com/EricLBuehler/mistral.rs/blob/6aec940499be1cf72c628f7ddaa8b3e59bcb4fda/mistralrs-core/src/ops.rs#L482-L504

`broadcast_as` error when processing multiple tokens at once in quantized example

@LaurentMazare, is this a mistake on my part?

`broadcast_as` error when processing multiple tokens at once in quantized example

For speculative decoding, we need to run the target model with multiple tokens at once, once per step. If we need to run the target model with a full prompt,...

`broadcast_as` error when processing multiple tokens at once in quantized example

Ok. Would this be similar to #2111?

`broadcast_as` error when processing multiple tokens at once in quantized example

Ok, so just to confirm: it is this part? > https://github.com/huggingface/candle/pull/2111/files#diff-ed262e4bc9a4a093e64842a2f61a85e1713c4efde0618ac7b31ad58dc5d171e3R137-R149 I can add a PR for this to some of the models if you think it is a good...