GRACE's performance collapes on LLAMA-3-8B
Thanks a lot for the great works,
I want to ask about the performance of GRACE with LLAMA-3-8B model.
My reproduced results with GRACE, as well as the results reported by WISE, suggest that GRACE has very good performance (in terms of rewrite accuracy and localization accuracy) with both Llama-2-7b and Mistral-7b.
However, currently on my side it has very low performance with LLAMA-3-8B, even with only 1 edit sample (my accuracy for T=1 is less than 40% rewrite accuracy). Meanwhile, the training loss drops to nearly 0 so I could not find the reason for the collapsing performance. I have extensively tuned the hyper-parameters but nothing has worked yet.
Do you have any intuitions / suggestions to fix the performance issue with GRACE on LLAMA-3?
Thanks in advance.
This situation seems quite strange. You can try using LLaMA2 to see if the same issue occurs. If it doesn’t, it might indicate that there are some compatibility issues between Grace and LLaMA3. We will debug this further later. Would you be willing to provide the exact version of LLaMA3 you're using to help us with the debugging process?
Thanks for your reply. On my side, the issue only occurs for LLaMA-3, for LlaMA-2 and Mistral it has near-100% reliability performance.
I also saw a similar issue from https://github.com/zjunlp/EasyEdit/issues/487, where they also observed that GRACE fails on LLaMA-3 with easyedit repo, and there was no final conclusion yet.
I'm testing mainly with LLaMA-3-8b-Instruct, but the base LLaMA-3 model also fails on my side.
Thanks in advance for your time.
Maybe you can check the tokenizer? The tokenizations between these two models are different, which would influence the performance significantly.
Hi wang-kee,
Have you found the reason? I'm using Llama3.1-8b-instruct and have encountered the same issue.