pycocoevalcap icon indicating copy to clipboard operation
pycocoevalcap copied to clipboard

CIDEr score is 0 while all other metrics are normal

Open mlching opened this issue 1 year ago • 6 comments

I'm currently using the pycocoevalcap package to evaluate the performance of my image captioning model. I've noticed that the CIDEr score is consistently 0 for all of my model's generated captions, while all other metrics (BLEU, METEOR, SPICE and ROUGE) are normal.

I have tried to run the evaluation on each image separately, but the situation remains the same. The CIDEr score is always 0.

I'm not sure what could be causing this issue, as the other metrics seem to be working correctly. Can anyone help me figure out why the CIDEr score is not being computed correctly?

Thanks in advance for your help!

mlching avatar Jul 25 '23 05:07 mlching

Were you able to resolve this issue? I am experiencing the same problem

suraj-nair-tri avatar Nov 12 '23 22:11 suraj-nair-tri

No, I haven't been able to resolve the issue either. I'm still experiencing the same problem.

mlching avatar Nov 13 '23 10:11 mlching

Could you provide a minimal code example to reproduce this issue? Do you get normal values if you try to run the example from this repository: example/coco_eval_example.py?

salaniz avatar Jan 19 '24 16:01 salaniz

Hi @salaniz

I have the same problem as @mlching and I get normal values for CIDEr metric with the example from your repository.

Here is an example of what I implement :

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference = "The cat is black ."
prediction = "The cat is black ."

dict_reference = {'0': [reference]}
dict_prediction = {'0': [prediction]}

# compute BLEU score
scores, _ = scorers["bleu"].compute_score(dict_ref, dict_pre) # IT RETURNS 1.0

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_ref, dict_pre) # IT RETURNS 0.0 

Thanks in advance for your help !

theodpzz avatar Feb 23 '24 13:02 theodpzz

@salaniz Returns 0.0 if reference inputs are the same

The score is 0.0 with the first example because inputs are the same

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference1, reference2 = "the cat is black", "the cat is black"
prediction1, prediction2 = "the cat is black", "the eyes are green"

dict_reference = {391895: [reference1], 522418: [reference2]}
dict_prediction = {391895: [prediction1], 522418: [prediction2]}

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_reference, dict_prediction) # IT RETURNS 0.0 
print(f'CIDEr: {scores}')

And the score from the code below is 10.0

from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider

# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()

# toy dataset for the example
reference1, reference2 = "the cat is black", "the eyes are green"
prediction1, prediction2 = "the cat is black", "the eyes are green"

dict_reference = {391895: [reference1], 522418: [reference2]}
dict_prediction = {391895: [prediction1], 522418: [prediction2]}

# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_reference, dict_prediction) # IT RETURNS 10.0 
print(f'CIDEr: {scores}')

But did not explore the code deeper so can't tell why

theodpzz avatar Feb 23 '24 14:02 theodpzz

Actually this metrics uses the IDF so it requires computing across the whole dataset at once

theophilegervet avatar Feb 29 '24 16:02 theophilegervet