Clémentine Fourrier issues

Results 43 issues of


                                            Clémentine Fourrier

[FT] adding a logprob metric with varying contexts, not choices

Allows to launch a MCQA evaluation with several contexts associated with several choices - each context must map one specific choice, log probs are computed on the choices only.

[FT] IFEval and extended tasks are not in the test suite

## Issue encountered We therefore can incidentally break them without noticing ## Solution/Feature Add them to the test suite

feature request

[FT] Detect max length from perplexity

`we could try measuring ppl given a certain context length. When it starts increasing (or the model throws exceptions), then we have reached max length` -> to explore

feature/enhancement

[DOC] Document the custom model config files

[FT] Add nucleus sampling

https://en.wikipedia.org/wiki/Top-p_sampling

feature request

See [slack](https://huggingface.slack.com/archives/C055130UYQG/p1721995050911309) Cache each task which finished to 1) avoid re-running it 2) restart evals which failed 3) avoid re-calling model as a judge when re-running

feature request

Extend #228 with few shot samples

#228 introduces kwargs to the CLI to better see the samples. This PR could be extended to also show few shot samples with truncation or not, given a max length.

Add evaluations for function calling

cc @lewtun on slack

Fix 341

- Should fix #341: - if a task had a greedy metric + sampling at n metric, it would do sampling at 1 + sampling at n instead) - associated...

[FT] Provide an interface for easier edit of parametrizable metrics

## Issue encountered When using a metric which is parametrizable, you need to create a new custom metric to edit default parameters - for some metrics like the BertScorer it's...

feature/enhancement