Rémi Louf

Results 533 comments of Rémi Louf

Thank you for opening an issue! Is there any way you could run this in a debugger and tell me what is passed to `self.tokenizer.encode`? I suspect a `Numpy` array,...

> Everywhere else uses choices as a parameter when creating the generator. In my case the choices change each run so I thought this would be the natural way of...

Yes, `maxLength` generates a DFA with many repeated states, which leads to large memory usage. We are considering ways to dramatically reduce the memory usage, but they require low-level work...

We did not have chat models in mind when building the library, and so this is not completely surprising. We could change this behavior, which is indeed a bit of...

Yes you should be able to use this with multimodal models!

This is surprising to me because even GTP2 outputs reasonable results. Two things: 1. The model seems to choose weird whitespace patterns. You can constraint this with the `whitespace_pattern` argument...

Thank you for opening an issue! I have a few very naive questions and remarks: - Does that mean that HF transformers is going to cache KV values by default?...

This looks awesome and seems to be the way to go! We also have some ideas around cache management that are complementary to yours and it would be nice to...

One quick question @gante: we have our own generation layer in Outlines and don't use `generate`. Can I use a `Cache` instance for `past_key_value` and it will work out of...

Doing this I started to wonder if we shouldn't see and implement greedy and multinomial sampling as particular cases of more general samplers (resp. a form of beam search and...