unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark

Open kmazrolina opened this issue 1 month ago • 0 comments

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark

  • Added metric definitions for llama-3-3-70b as judge in Arena Hard benchmark supporting:
    • WML Inference Engine
    • Generic Inference Engine

kmazrolina avatar Oct 27 '25 13:10 kmazrolina