Easy-Transformer How to get the Activation cache while the LLM is generating new tokens?

Question

My prompt is "Which instrument does Henry Halstead mainly play? Please answer an instrument name. Answer: ", which is a question to LLM. I want to get the cached hidden states of the LLM responsed tokens while LLM is generating the response. How to do it?

On one hand, the codelogits, cache = model.run_with_cache(prompt, return_cache_object=True) only cache the hidden states of the prompt, because it dosen't run the generate function.

On the other hand, the code output = model.generate(prompt, do_sample = False, max_new_tokens = 20) only get the generated tokens or sentences, I can't get the Activation_cache of the generated Answer.

So how can I obtain the model response and the Acativation_cache of the LLM response tokens at the same time during one reasoning process?

Aug 07 '24 12:08 Meehaohao

Unfortunately, at this moment in time there is no integration of activation cache in the generate function. I don't see any reason why we can't add that as an option, but it would unfortunately be a pretty low priority given some other projects that are currently being worked on unless someone volunteers to do it.

Aug 16 '24 03:08 bryce13950

Got it, thank you.

Aug 16 '24 09:08 Meehaohao