Joel Niklaus issues

Results 15 issues of


                                            Joel Niklaus

[FT] Adding caching for each dataset run

## Issue encountered When running large evals with many dataset configurations it is very painful to rerun everything in case something fails. ## Solution/Feature It would be great if intermediate...

feature/enhancement

[FT] Adding diskcache

## Issue encountered Rerunning evaluations with LLM as judge metrics can be expensive and time-consuming. ## Solution/Feature Adding a diskcache to the JudgeLM class could solve this.

feature/enhancement

[FT] Is it possible to save the predictions to prevent rerunning expensive inference

## Issue encountered When evaluating large models, significant costs and delays can occur for inference, especially on larger datasets. Possibly I want to re-evaluate my predictions using different metrics. ##...

feature/enhancement

[FT] Support llama.cpp inference

## Issue encountered Currently, inference of open models on my Mac device is quite slow since vllm does not support mps. ## Solution/Feature Llama.cpp does support mps and would significantly...

feature/enhancement

Update helm_prompt_settings.jsonl to allow for evaluation of all tasks

HELM currently only evaluates on 5 LegalBench tasks. Ideally, we would like to be able to run evaluation on all tasks. I quickly analyzed the structure of the tasks and...