Easy-Transformer icon indicating copy to clipboard operation
Easy-Transformer copied to clipboard

How to get the Activation cache while the LLM is generating new tokens?

Open Meehaohao opened this issue 1 year ago • 2 comments

Question

My prompt is "Which instrument does Henry Halstead mainly play? Please answer an instrument name. Answer: ", which is a question to LLM. I want to get the cached hidden states of the LLM responsed tokens while LLM is generating the response. How to do it?

On one hand, the codelogits, cache = model.run_with_cache(prompt, return_cache_object=True) only cache the hidden states of the prompt, because it dosen't run the generate function.

On the other hand, the code output = model.generate(prompt, do_sample = False, max_new_tokens = 20) only get the generated tokens or sentences, I can't get the Activation_cache of the generated Answer.

So how can I obtain the model response and the Acativation_cache of the LLM response tokens at the same time during one reasoning process?

Meehaohao avatar Aug 07 '24 12:08 Meehaohao

Unfortunately, at this moment in time there is no integration of activation cache in the generate function. I don't see any reason why we can't add that as an option, but it would unfortunately be a pretty low priority given some other projects that are currently being worked on unless someone volunteers to do it.

bryce13950 avatar Aug 16 '24 03:08 bryce13950

Got it, thank you.

Meehaohao avatar Aug 16 '24 09:08 Meehaohao