lm-evaluation-harness icon indicating copy to clipboard operation
lm-evaluation-harness copied to clipboard

[Question] A way to run multiple evals on multiple models?

Open tanaymeh opened this issue 6 months ago • 0 comments

Hi, I was wondering if there is a way supported by Python API to run multiple eval benchmarks on multiple models (by passing in a list of models and their respective arguments in the model_args argument)?

For example:

results = lm_eval.simple_evaluate(
    model="hf",
    model_args=["pretrained=microsoft/phi-2,trust_remote_code=True", "pretrained=microsoft/phi-3,trust_remote_code=True"],
    tasks=["hellaswag","mmlu_abstract_algebra"],
    log_samples=True,
)

TIA!

tanaymeh avatar Jul 29 '24 15:07 tanaymeh