LongICLBench Unexpected low performance of mistral-v0.1

Unexpected low performance of mistral-v0.1

Open Mooler0410 opened this issue 10 months ago • 4 comments

Hi!

I tried to run BANKING77 with mistral-7b-v0.1. However, I got a nearly zero performance. I have no idea about how this could happen. Have you ever tested mistral-7b-v0.1? Or, if not, did you ever encounter similar issues?

Thanks!

Apr 19 '24 18:04 Mooler0410

v0.1 only supports 8K token length, which leads to low performance. We use v0.2 because it supports 32K tokens.

Apr 19 '24 19:04 wenhuchen

v0.1 only supports 8K token length, which leads to low performance. We use v0.2 because it supports 32K tokens.

The first 3 subsets of BANKING77 are below 8k. So, on the three subsets, I expected that mistral-7b-v0.1 should have a performance lower than v0.2 but at least comparable to other models. However, I got a performance of (nearly) zero.

Using the same codes and replacing mistral-7b-v0.1 with mistral-7b-v0.2, I can get similar results as those reported in the paper.

I'm not sure what I did wrong 🙃

Apr 19 '24 19:04 Mooler0410

We observe similar trend for Gemma, which should support 8K but actually failed on tasks less than 8K tokens.

Apr 19 '24 19:04 wenhuchen

Got it. Seems like some prompting stuffs. I will try to modify the prompts to see if there is any difference.

Thank you for your clarification!

Apr 19 '24 19:04 Mooler0410

LongICLBench LongICLBench copied to clipboard

Unexpected low performance of mistral-v0.1

LongICLBench
LongICLBench copied to clipboard