evaluate Evaluation of form feed symbol with BLEU results in error

trafficstars

Hi, I'm generating LLM sequences with some of the HF models such as pythia-1.4b. Some of my generations result in a sequence consisting only of form feed token, which is 12th ASCII character.

from evaluate import load

bleu = load("bleu")

prediction = "hello"
reference = chr(12)

bleu_score = bleu.compute(
    predictions=[prediction], references=[[reference]]
)["bleu"]

The following code results in an error:

ZeroDivisionError                         Traceback (most recent call last)
[<ipython-input-1-8625f8bf1df7>](https://localhost:8080/#) in <cell line: 8>()
      6 reference = chr(12)
      7 
----> 8 bleu_score = bleu.compute(
      9     predictions=[prediction], references=[[reference]]
     10 )["bleu"]

2 frames
[/usr/local/lib/python3.10/dist-packages/evaluate/module.py](https://localhost:8080/#) in compute(self, predictions, references, **kwargs)
    465             inputs = {input_name: self.data[input_name] for input_name in self._feature_names()}
    466             with temp_seed(self.seed):
--> 467                 output = self._compute(**inputs, **compute_kwargs)
    468 
    469             if self.buf_writer is not None:

[~/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--bleu/9e0985c1200e367cce45605ce0ecb5ede079894e0f24f54613fca08eeb8aff76/bleu.py](https://localhost:8080/#) in _compute(self, predictions, references, tokenizer, max_order, smooth)
    120         references = [[tokenizer(r) for r in ref] for ref in references]
    121         predictions = [tokenizer(p) for p in predictions]
--> 122         score = compute_bleu(
    123             reference_corpus=references, translation_corpus=predictions, max_order=max_order, smooth=smooth
    124         )

[~/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--bleu/9e0985c1200e367cce45605ce0ecb5ede079894e0f24f54613fca08eeb8aff76/nmt_bleu.py](https://localhost:8080/#) in compute_bleu(reference_corpus, translation_corpus, max_order, smooth)
    101     geo_mean = 0
    102 
--> 103   ratio = float(translation_length) / reference_length
    104 
    105   if ratio > 1.0:

ZeroDivisionError: float division by zero

The expected behaviour would be that the score should still be computed for this character even though this is a non-printable character. I believe this will happen with other non-printable characters. Is this an intended behaviour?

Jun 11 '24 01:06 lowlypalace

A similar issue, just with an empty prediction:

from evaluate import load

bleu = load("bleu")

prediction = ""
reference = "test"

print( bleu.compute(
    predictions=[prediction], references=[[reference]]
))

Leads to a float division by zero 3 lines later.

The source file is: https://github.com/tensorflow/nmt/blob/master/nmt/scripts/bleu.py And the fixed code would be instead of:

  ratio = float(translation_length) / reference_length

  if ratio > 1.0:
    bp = 1.
  else:
    bp = math.exp(1 - 1. / ratio)

To have

  ratio = float(translation_length) / min(1, reference_length)

  if ratio > 1.0 or ratio == 1:
    bp = 1.
  else:
    bp = math.exp(1 - 1. / ratio)

May 08 '25 12:05 AmitMY

  ratio = float(translation_length) / min(1, reference_length)

  if ratio > 1.0 or ratio == 1:
    bp = 1.
  else:
    bp = math.exp(1 - 1. / ratio)

@AmitMY The "fixed code" maybe still have problem...when reference_length=0, min(1, reference_length) = 0, still got ZeroDivisionError error. And this doesn't fix the ZeroDivisionError when translation_length=0.

Jun 08 '25 15:06 shenxiangzhuang

evaluate evaluate copied to clipboard

Evaluation of form feed symbol with BLEU results in error

evaluate
evaluate copied to clipboard