Retrieval_Head
Retrieval_Head copied to clipboard
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
https://github.com/nightdessert/Retrieval_Head/blob/3ac171a6f71ce7ef1cda57d4215c390fb6ab51f2/faiss_attn/source/modeling_llama.py#L685-L689 Hi there, I'm working on the interpretability of attention heads and your work is really inspiring. I'm not pro in modifications on these attention heads, so I'm a little...
Hi, thank you for the awsome code! When I run `python retrieval_head_detection.py --model_path microsoft/Phi-3-mini-128k-instruct --s 0 --e 50000`, I got the following errors: Could you share the tips to fix...
python3 needle_in_haystack_with_mask.py TypeError: Qwen2ForCausalLM.forward() got an unexpected keyword argument 'block_list'
https://github.com/nightdessert/Retrieval_Head/blob/3ac171a6f71ce7ef1cda57d4215c390fb6ab51f2/retrieval_head_detection.py#L188 For "llama-2-7b-80k" model, why we need to reset the rope parameters? It seems that the config.json have included the scale_factor.
I would like to reopen the issue: https://github.com/nightdessert/Retrieval_Head/issues/6#issuecomment-3183153245 Any help from the authors is highly appreciated. Thanks very much!
It's strange that the colon didn't cause an error, and the code continued to execute, but it didn't take effect .