Joel Niklaus

Results 15 issues of Joel Niklaus

## Issue encountered When running large evals with many dataset configurations it is very painful to rerun everything in case something fails. ## Solution/Feature It would be great if intermediate...

feature/enhancement

## Issue encountered Rerunning evaluations with LLM as judge metrics can be expensive and time-consuming. ## Solution/Feature Adding a diskcache to the JudgeLM class could solve this.

feature/enhancement

## Issue encountered When evaluating large models, significant costs and delays can occur for inference, especially on larger datasets. Possibly I want to re-evaluate my predictions using different metrics. ##...

feature/enhancement

## Issue encountered Currently, inference of open models on my Mac device is quite slow since vllm does not support mps. ## Solution/Feature Llama.cpp does support mps and would significantly...

feature/enhancement

HELM currently only evaluates on 5 LegalBench tasks. Ideally, we would like to be able to run evaluation on all tasks. I quickly analyzed the structure of the tasks and...