Rémi Louf
Rémi Louf
I see a few ways we could dramatically reduce the memory that these FSMs take: 1. Make sure that the FSM cannot be further reduced (no redundant states) 2. Reduce...
> Is there something tangential out there we could use instead of fsms? There are a few things to try before completely replacing the DFA logic. For instance, how many...
What would this look like assuming we've implemented `outlines.models.vllm`? Wouldn't it need to expose an async generator?
This is a very good point @kelsey-sorrels. What parameters do you usually have to pass to the model?
> Might be able to add Beam Search and Greedy to llama.cpp, I'll look into whether it's possible and create an issue if so. You can do greedy by setting...
We managed to reproduce the error internally and will hopefully soon come up with a fix.
Yes, made the change, thank you! I have no guarantee that this will work though
Hey @gerdm do you still intend on working on this?
Outlines author here. I don't know if this can help, but we are about to release a Rust port of our structured generation algorithms, which are of course faster, but...
Of course, only mentioning this because the approach is different and so is runtime latency.