Rémi Louf comments

Results 533 comments of


                                            Rémi Louf

Vectorized calls to OpenAI models are failing

Thank you for opening an issue! Is there any way you could run this in a debugger and tell me what is passed to `self.tokenizer.encode`? I suspect a `Numpy` array,...

Documenting the speed gain/loss

> Everywhere else uses choices as a parameter when creating the generator. In my case the choices change each run so I thought this would be the natural way of...

How can we leverage the GPU for inference formatting?

Yes, `maxLength` generates a DFA with many repeated states, which leads to large memory usage. We are considering ways to dramatically reduce the memory usage, but they require low-level work...

stop_at doesn't stop generation early when using Exllamav2

We did not have chat models in mind when building the library, and so this is not completely surprising. We could change this behavior, which is indeed a bit of...

Support for multi-modal models

Yes you should be able to use this with multimodal models!

LlamaCpp - Logits Processor never ends

This is surprising to me because even GTP2 outputs reasonable results. Two things: 1. The model seems to choose weird whitespace patterns. You can constraint this with the `whitespace_pattern` argument...

Use 'Cache' Class in past_key_values for transformers

Thank you for opening an issue! I have a few very naive questions and remarks: - Does that mean that HF transformers is going to cache KV values by default?...

Use 'Cache' Class in past_key_values for transformers

This looks awesome and seems to be the way to go! We also have some ideas around cache management that are complementary to yours and it would be nice to...

Use 'Cache' Class in past_key_values for transformers

One quick question @gante: we have our own generation layer in Outlines and don't use `generate`. Can I use a `Cache` instance for `past_key_value` and it will work out of...

Add Particle Filter

Doing this I started to wonder if we shouldn't see and implement greedy and multinomial sampling as particular cases of more general samplers (resp. a form of beam search and...