[🎯 Roadmap] EvalScope Roadmap

Open Yunnglin opened this issue 2 months ago • 0 comments

English Version

[ ] Performance Testing Enhancement: Support dynamic concurrency adjustment and automatic testing of model service metrics including minimum latency, TTFT (Time To First Token), and maximum throughput
[x] Extended Evaluation Metrics: Add support for more evaluation metrics, including cons@k, G-pass@k, etc.
[x] Function Call & Tool Use: Add support for evaluating customized scenarios of function-call and tool-use
[ ] Prompt Management Optimization: Improve prompt management to facilitate setting different prompts for benchmarks
[ ] Safety Benchmarks: Support safety-related benchmarks (suggestions for datasets are welcome)
[ ] UI Development: Develop an interactive UI interface for visual model evaluation (long-term goal)
[ ] Benchmark Collection: More comprehensive support of benchmarking Collection for evaluating indexed benchmark suites
[ ] Stress testing support for embedding model services

Embedding Model Evaluation: Fix benchmark misalignment issue in embedding model evaluation
Issue: https://github.com/modelscope/evalscope/issues/753
RAG Evaluation: Fix the issue where evaluation sets cannot be automatically constructed in rageval
Issue: https://github.com/modelscope/evalscope/issues/859

嵌入模型评测：修复 embedding 模型评测存在 benchmark 不对齐的问题
Issue：https://github.com/modelscope/evalscope/issues/753
RAG 评测：修复 rageval 存在无法自动构建评测集的问题
Issue：https://github.com/modelscope/evalscope/issues/859

Nov 04 '25 12:11 Yunnglin