`Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`

Open fuqichen1998 opened this issue 1 year ago • 5 comments

As the title, my evaluation of Llama2-7B-chat-4k on PassageRetrieval-zh gets 10.12, which is significantly higher than the README (0.5), could you please share why?

Mar 18 '24 21:03 fuqichen1998

Hi! Are you using the prompt template as in config/dataset2prompt.json?

Mar 20 '24 09:03 bys0318

We refer to our code here for the llama2 prompt: https://github.com/THUDM/LongBench/blob/main/pred.py#L33

Mar 20 '24 09:03 bys0318

Yes, I was using your pred.py to run the inference and evaluation.

Mar 20 '24 17:03 fuqichen1998

Yes, I was using your pred.py to run the inference and evaluation.

Acutally I also get the same result

May 09 '24 05:05 slatter666

We refer to our code here for the llama2 prompt: https://github.com/THUDM/LongBench/blob/main/pred.py#L33

The INST is necessary for llama2-7b/llama2-13b?

Sep 06 '24 02:09 condy0919