Clémentine Fourrier

Results 43 issues of Clémentine Fourrier

Allows to launch a MCQA evaluation with several contexts associated with several choices - each context must map one specific choice, log probs are computed on the choices only.

## Issue encountered We therefore can incidentally break them without noticing ## Solution/Feature Add them to the test suite

feature request

`we could try measuring ppl given a certain context length. When it starts increasing (or the model throws exceptions), then we have reached max length` -> to explore

feature/enhancement

https://en.wikipedia.org/wiki/Top-p_sampling

feature request

See [slack](https://huggingface.slack.com/archives/C055130UYQG/p1721995050911309) Cache each task which finished to 1) avoid re-running it 2) restart evals which failed 3) avoid re-calling model as a judge when re-running

feature request

#228 introduces kwargs to the CLI to better see the samples. This PR could be extended to also show few shot samples with truncation or not, given a max length.

- Should fix #341: - if a task had a greedy metric + sampling at n metric, it would do sampling at 1 + sampling at n instead) - associated...

## Issue encountered When using a metric which is parametrizable, you need to create a new custom metric to edit default parameters - for some metrics like the BertScorer it's...

feature/enhancement