evaluate issues

only track unique missing dependencies

1

Currently, when you load a metric that loads the same metric twice, e.g., [chrF](https://huggingface.co/spaces/evaluate-metric/chrf/blob/main/chrf.py#L16-L18), the error message for the missing library will mention that library multiple times. For instance: >...

BramVanroy

Cannot use f1/recall/precision arguments in `CombinedEvaluations.compute`

3

This works: ```python metric=evaluate.load('f1') metric.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], average=None) ``` This won't work: ```python metric=evaluate.combine(["f1"]) metric.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1,...

fcakyon

huggingface / evaluate is excellent

shysuen-002

Fix gradio widgets on the perplexity metric/measurement spaces

1

This widget seems like it'd be useful for demonstration purposes but right now I'm unclear if it's broken or incomplete. I assume the rows in the columns data (measurement) and...

mathemakitten

Add support for two input columns for TextClassificationEvaluator

4

This PR: * Refactors the docstrings to avoid duplicates in the `Evaluator` subclasses * Put all arguments in the `compute()` signature for subclasses of Evaluator, so as to be able...

fxmarty

Perplexity with `torch.exp2` instead of `torch.exp`

Hi, thanks for your work on this project! I was surprised the see that your perplexity implementation uses the base two exponential. See https://github.com/huggingface/evaluate/blob/main/metrics/perplexity/perplexity.py#L183 Is this indented or a bug?

malteos

Refactor perplexity implementations to be usable with evaluators

Currently the `perplexity` metric and measurement both instantiate an entire model object within the `_compute()` function and run inference, which breaks the pattern where only predictions, references, and other metadata...

mathemakitten

Basic doc example not running in poetry shell. "No module named _lzma"

2

`poetry add evaluate` ``` Using version ^0.2.2 for evaluate Updating dependencies Resolving dependencies... (86.1s) Writing lock file Package operations: 15 installs, 0 updates, 0 removals • Installing frozenlist (1.3.1) •...

ahmed-moubtahij

Cache results from `evaluator` and implement data canaries for reproducibility

4

Caching results from the Evaluator requires checking uniqueness of results against a (model_or_pipeline, dataset, evaluation module) tuple. We can version datasets by accessing their `.fingerprint` attribute, and evaluation modules by...

mathemakitten

Add the CIDEr metric?

7

Hi, I find the api in https://huggingface.co/metrics quite useful. I am playing around with video/image captioning task, where CIDEr is a popular metric. Do you plan to add this into...

zuujhyt

metric request

evaluate
evaluate copied to clipboard

Metadata

only track unique missing dependencies

Cannot use f1/recall/precision arguments in `CombinedEvaluations.compute`

huggingface / evaluate is excellent

Fix gradio widgets on the perplexity metric/measurement spaces

Add support for two input columns for TextClassificationEvaluator

Perplexity with `torch.exp2` instead of `torch.exp`

Refactor perplexity implementations to be usable with evaluators

Basic doc example not running in poetry shell. "No module named _lzma"

Cache results from `evaluator` and implement data canaries for reproducibility

Add the CIDEr metric?

← Metadata

Owner

Metadata

evaluate evaluate copied to clipboard

Metadata

← Metadata

Owner

Metadata

evaluate
evaluate copied to clipboard