pycocoevalcap CIDEr score is 0 while all other metrics are normal

I'm currently using the pycocoevalcap package to evaluate the performance of my image captioning model. I've noticed that the CIDEr score is consistently 0 for all of my model's generated captions, while all other metrics (BLEU, METEOR, SPICE and ROUGE) are normal.

I have tried to run the evaluation on each image separately, but the situation remains the same. The CIDEr score is always 0.

I'm not sure what could be causing this issue, as the other metrics seem to be working correctly. Can anyone help me figure out why the CIDEr score is not being computed correctly?

Thanks in advance for your help!

Jul 25 '23 05:07 mlching

Were you able to resolve this issue? I am experiencing the same problem

Nov 12 '23 22:11 suraj-nair-tri

No, I haven't been able to resolve the issue either. I'm still experiencing the same problem.

Nov 13 '23 10:11 mlching

Could you provide a minimal code example to reproduce this issue? Do you get normal values if you try to run the example from this repository: example/coco_eval_example.py?

Jan 19 '24 16:01 salaniz

Hi @salaniz

I have the same problem as @mlching and I get normal values for CIDEr metric with the example from your repository.

Here is an example of what I implement :

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference = "The cat is black ."
prediction = "The cat is black ."

dict_reference = {'0': [reference]}
dict_prediction = {'0': [prediction]}

# compute BLEU score
scores, _ = scorers["bleu"].compute_score(dict_ref, dict_pre) # IT RETURNS 1.0

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_ref, dict_pre) # IT RETURNS 0.0

Thanks in advance for your help !

Feb 23 '24 13:02 theodpzz

@salaniz Returns 0.0 if reference inputs are the same

The score is 0.0 with the first example because inputs are the same

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference1, reference2 = "the cat is black", "the cat is black"
prediction1, prediction2 = "the cat is black", "the eyes are green"

dict_reference = {391895: [reference1], 522418: [reference2]}
dict_prediction = {391895: [prediction1], 522418: [prediction2]}

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_reference, dict_prediction) # IT RETURNS 0.0 
print(f'CIDEr: {scores}')

And the score from the code below is 10.0

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference1, reference2 = "the cat is black", "the eyes are green"
prediction1, prediction2 = "the cat is black", "the eyes are green"

dict_reference = {391895: [reference1], 522418: [reference2]}
dict_prediction = {391895: [prediction1], 522418: [prediction2]}

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_reference, dict_prediction) # IT RETURNS 10.0 
print(f'CIDEr: {scores}')

But did not explore the code deeper so can't tell why

Feb 23 '24 14:02 theodpzz

Actually this metrics uses the IDF so it requires computing across the whole dataset at once

Feb 29 '24 16:02 theophilegervet

pycocoevalcap pycocoevalcap copied to clipboard

CIDEr score is 0 while all other metrics are normal

pycocoevalcap
pycocoevalcap copied to clipboard