Yi
Yi copied to clipboard
It would be nice to test the model on more benchmarks
MT-Bench | AGIEval | BBH MC | TruthfulQA | MMLU | HumanEval | BBH CoT | GSM8K
There will be more benchmark result in our tech report that will be released later.
The technical report has been released: https://arxiv.org/abs/2403.04652