Leandro von Werra

Results 155 comments of Leandro von Werra

It's because also the metrics on the Hub got updated and it seems something with the versioning between `evaluate` and the scripts on the hub did not work properly.

Thanks @abidlabs! This looks great, since this will update all evaluation spaces would you mind setting up a test space which installs this branch to test that this works? You...

Thanks for your feedback, I think I integrated everything. Let me know if you think anything else needs to be changed!

@lhoestq maybe we can use the `train_eval_index` by default if nothing is specified and have the option to overwrite (if a `train_eval_index` is available) or provide (if no `train_eval_index` is...

Following-up on last week's discussion, here are some ideas how we could frame the `SubTask`s (placeholder name, I am sure there is a better one) of the `EvaluationSuite` as classes/dataclasses....

- **Duplicate work:** If we stick to `datasets` and use `ds.map(data_preprocessor)` then the processing would be cached by default. So if two tasks would use the same dataset + configuration...

Thanks for the feedback and your thoughts. Answering the open questions: - @NimaBoscarino: Oh indeed, the `metric` should be an additional attribute of the data class. - @mathemakitten: I think...

I have some ideas on how this could be done and I'll start drafting a PR hopefully this week and I can tag you on it @Mouhanedg56!

Yes, that's what I had in mind. We probably need to strip out the YAML part of the README first. This can be done with [this regex](https://github.com/huggingface/huggingface_hub/blob/af1d49e9eed3b31be86652cfcfdae9bfba4f3814/src/huggingface_hub/repocard.py#L39). Then we just...

Is that what you are looking for? https://github.com/robinhood/faust/blob/6a9d55b92a18e5bb17b04fc74816f4f88645c476/examples/windowed_aggregation.py#L55