mobile_app_open icon indicating copy to clipboard operation
mobile_app_open copied to clipboard

"Accuracy" metric for LLM model(s)

Open freedomtan opened this issue 10 months ago • 104 comments

Which "accuracy" metric(s) should we use for LLM benchmarking?

  • MMLU: the first item people choose. it covers several field with multiple-choice questions.
    • @mohitmundhragithub please point where/how MLPerf Client use this
    • mostly running a full-set of this is gonna take several hours on Android devices
    • hence, other choice is TinyMMLU
    • TinyMMLU: 100 questions only
  • Other tasks such as summarization, Q/A

freedomtan avatar Apr 22 '25 05:04 freedomtan