Eric Buehler
Eric Buehler
Thank you, @ArthurZucker for the link! I was actually able to get the GPT2 conversion to work now!
@BHX2, @LLukas22 I just merged #262! You can use per-request LoRA activation which in all APIs. After setting up your adapter model ordering file, you can try it out: [examples...
@gregszumel, thanks for the explanation. I would love to see this added to Candle. If you want to contribute this to mistral.rs please feel free!
@LaurentMazare, thanks! I saw that PR and am very excited for it to be merged.
@gregszumel, that sounds great! If you decide to contribute it to mistral.rs, that would be much appreciated.
For future reference, here's the implementation: https://github.com/EricLBuehler/mistral.rs/blob/6aec940499be1cf72c628f7ddaa8b3e59bcb4fda/mistralrs-core/src/ops.rs#L482-L504
@LaurentMazare, is this a mistake on my part?
For speculative decoding, we need to run the target model with multiple tokens at once, once per step. If we need to run the target model with a full prompt,...
Ok. Would this be similar to #2111?
Ok, so just to confirm: it is this part? > https://github.com/huggingface/candle/pull/2111/files#diff-ed262e4bc9a4a093e64842a2f61a85e1713c4efde0618ac7b31ad58dc5d171e3R137-R149 I can add a PR for this to some of the models if you think it is a good...