CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

how to obtain get past_kv_cache values?

Open arunpatro opened this issue 1 year ago • 3 comments

In hugging face autoclass models, we can do model.forward to obtain both logits and past_key_values. The model.generate then uses these past_key_values for efficient gen.

How can I return these past_key_values in generator.forward_batch api?

arunpatro avatar Aug 02 '23 01:08 arunpatro

At this time forward_batch is not designed to be used for iterative decoding. It computes the logits (or log probs) for a full sequence.

Why can't you use generate_batch for generation?

guillaumekln avatar Aug 02 '23 08:08 guillaumekln

I definitely can use generate_batch, that works fast! But if I had access to the kv cache or HF like api then I could do fast guided generation like with the outlines. See: PR

arunpatro avatar Aug 02 '23 21:08 arunpatro