unitxt
unitxt copied to clipboard
New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark
New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark
- Added metric definitions for llama-3-3-70b as judge in Arena Hard benchmark supporting:
- WML Inference Engine
- Generic Inference Engine