CTranslate2
CTranslate2 copied to clipboard
how to obtain get past_kv_cache values?
In hugging face autoclass models, we can do model.forward
to obtain both logits and past_key_values. The model.generate
then uses these past_key_values for efficient gen.
How can I return these past_key_values in generator.forward_batch
api?
At this time forward_batch
is not designed to be used for iterative decoding. It computes the logits (or log probs) for a full sequence.
Why can't you use generate_batch
for generation?
I definitely can use generate_batch
, that works fast! But if I had access to the kv cache or HF like api then I could do fast guided generation like with the outlines. See: PR