lm-evaluation-harness
lm-evaluation-harness copied to clipboard
[Question] A way to run multiple evals on multiple models?
Hi, I was wondering if there is a way supported by Python API to run multiple eval benchmarks on multiple models (by passing in a list of models and their respective arguments in the model_args
argument)?
For example:
results = lm_eval.simple_evaluate(
model="hf",
model_args=["pretrained=microsoft/phi-2,trust_remote_code=True", "pretrained=microsoft/phi-3,trust_remote_code=True"],
tasks=["hellaswag","mmlu_abstract_algebra"],
log_samples=True,
)
TIA!