lighteval
lighteval copied to clipboard
[FT] Adding caching for each dataset run
Issue encountered
When running large evals with many dataset configurations it is very painful to rerun everything in case something fails.
Solution/Feature
It would be great if intermediate results could be cached, for example the computed metrics of each dataset.