LongBench
LongBench copied to clipboard
Are agentic systems in scope for your leaderboard?
I suspect it is easy to beat the scores you are getting (and maybe even get closer to 100 %) by multi-turn and agentic systems like LLMLingua-2 or GraphReader.
Aggregating tricks, and understanding how to get to an acceptable performance by an LLM, seems important to someone building a system in production.
Would you consider accepting submissions of such agentic systems in your leaderboard? In particular, if you do, it would be interesting to include information on total tokens consumed/number of consecutive steps taken as well.
Interesting! We would like to see how these agentic systems perform on the realistic tasks in LongBench v2. We welcome your submissions!