[FEATURE]: Terminal Bench 2.0 evaluation
Feature hasn't been suggested before.
- [x] I have verified this feature I'm about to request hasn't been suggested before.
Describe the enhancement you want to request
Hi OpenCode team! Thank you for a great product. It would be great for OpenCode marketing purposes to evaluate your harness in Terminal Bench 2.0. Perhay you evaluated it before?
This issue might be a duplicate of existing issues. Please check:
- #2104: Submit to Terminal-Bench - A previous request to submit OpenCode to the Terminal Bench leaderboard for performance comparison.
Feel free to ignore if your specific case requires something different.
We are working on our own benchmarks, but I'd refer you to this: https://github.com/sst/opencode/issues/2104#issuecomment-3215423015
@rekram1-node thanks for the answer! Same results as CC - it's very good actually. But it will be interesting to see OpenCode's score in Terminal Bench 2.0 on open source LLMs. @thdxr what about Terminal Bench 2.0? Is it still "academic puzzle"? (I don't know and really curious). Btw, I'm a founder of CodeAlive context engine, we're also looking for good benchmarks, we don't explore a new terminal bench yet.