Are agentic systems in scope for your leaderboard?

Open L3Gaunt opened this issue 9 months ago • 1 comments

I suspect it is easy to beat the scores you are getting (and maybe even get closer to 100 %) by multi-turn and agentic systems like LLMLingua-2 or GraphReader.

Aggregating tricks, and understanding how to get to an acceptable performance by an LLM, seems important to someone building a system in production.

Would you consider accepting submissions of such agentic systems in your leaderboard? In particular, if you do, it would be interesting to include information on total tokens consumed/number of consecutive steps taken as well.

Mar 05 '25 22:03 L3Gaunt

Interesting! We would like to see how these agentic systems perform on the realistic tasks in LongBench v2. We welcome your submissions!

Mar 12 '25 05:03 bys0318