lm-evaluation-harness [Question] A way to run multiple evals on multiple models?

[Question] A way to run multiple evals on multiple models?

Open tanaymeh opened this issue 6 months ago • 0 comments

Hi, I was wondering if there is a way supported by Python API to run multiple eval benchmarks on multiple models (by passing in a list of models and their respective arguments in the model_args argument)?

For example:

results = lm_eval.simple_evaluate(
    model="hf",
    model_args=["pretrained=microsoft/phi-2,trust_remote_code=True", "pretrained=microsoft/phi-3,trust_remote_code=True"],
    tasks=["hellaswag","mmlu_abstract_algebra"],
    log_samples=True,
)

TIA!

Jul 29 '24 15:07 tanaymeh

lm-evaluation-harness lm-evaluation-harness copied to clipboard

[Question] A way to run multiple evals on multiple models?

lm-evaluation-harness
lm-evaluation-harness copied to clipboard