Pinzhen "Patrick" Chen
Pinzhen "Patrick" Chen
## 🐛 Bug When the same source, target, reference files are evaluated using the same wmt22-comet-da checkpoint, `unbabel-comet 2.2.1` under `python3.9` and `unbabel-comet 1.1.2` under `python3.7` gave me dramatically different...
In practice I would have big noisy training data and sample clean data that is representative of the downstream task (e.g. wmt validation sets). It is still difficulty for me...
ATM there is no boundary between the select ones and available ones. Also some filters only need to be used once (like "remove whitespace"). maybe when they are selected, they...
Hi, In this repo, the `chrf` metric [implementation](https://github.com/EleutherAI/lm-evaluation-harness/blob/ebe7226ebfb8d11a9fb8d6b53eb65891f895c633/lm_eval/api/metrics.py#L92C1-L103C52) calls `sacrebleu.corpus_chrf()` with default [parameters](https://github.com/mjpost/sacrebleu/blob/0f351010b8b641aaa59fe75b98d7cc522bf221eb/sacrebleu/compat.py#L94): character order 6 and word order 0. Perhaps in `metric.py` it would be nice to include those...