pycocoevalcap
pycocoevalcap copied to clipboard
CIDEr score is 0 while all other metrics are normal
I'm currently using the pycocoevalcap package to evaluate the performance of my image captioning model. I've noticed that the CIDEr score is consistently 0 for all of my model's generated captions, while all other metrics (BLEU, METEOR, SPICE and ROUGE) are normal.
I have tried to run the evaluation on each image separately, but the situation remains the same. The CIDEr score is always 0.
I'm not sure what could be causing this issue, as the other metrics seem to be working correctly. Can anyone help me figure out why the CIDEr score is not being computed correctly?
Thanks in advance for your help!
Were you able to resolve this issue? I am experiencing the same problem
No, I haven't been able to resolve the issue either. I'm still experiencing the same problem.
Could you provide a minimal code example to reproduce this issue? Do you get normal values if you try to run the example from this repository: example/coco_eval_example.py?
Hi @salaniz
I have the same problem as @mlching and I get normal values for CIDEr metric with the example from your repository.
Here is an example of what I implement :
from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider
# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()
# toy dataset for the example
reference = "The cat is black ."
prediction = "The cat is black ."
dict_reference = {'0': [reference]}
dict_prediction = {'0': [prediction]}
# compute BLEU score
scores, _ = scorers["bleu"].compute_score(dict_ref, dict_pre) # IT RETURNS 1.0
# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_ref, dict_pre) # IT RETURNS 0.0
Thanks in advance for your help !
@salaniz Returns 0.0 if reference inputs are the same
The score is 0.0 with the first example because inputs are the same
from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider
# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()
# toy dataset for the example
reference1, reference2 = "the cat is black", "the cat is black"
prediction1, prediction2 = "the cat is black", "the eyes are green"
dict_reference = {391895: [reference1], 522418: [reference2]}
dict_prediction = {391895: [prediction1], 522418: [prediction2]}
# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_reference, dict_prediction) # IT RETURNS 0.0
print(f'CIDEr: {scores}')
And the score from the code below is 10.0
from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.cider.cider import Cider
# scorers
scorers["bleu"] = Bleu(1)
scorers["cider"] = Cider()
# toy dataset for the example
reference1, reference2 = "the cat is black", "the eyes are green"
prediction1, prediction2 = "the cat is black", "the eyes are green"
dict_reference = {391895: [reference1], 522418: [reference2]}
dict_prediction = {391895: [prediction1], 522418: [prediction2]}
# compute CIDER score
scores, _ = scorers["cider"].compute_score(dict_reference, dict_prediction) # IT RETURNS 10.0
print(f'CIDEr: {scores}')
But did not explore the code deeper so can't tell why
Actually this metrics uses the IDF so it requires computing across the whole dataset at once