Inconsistency between Agregate Results and Indices Results
Okay, kinda of a long story, so please bear with me.
It all started when I wanted to merge the Partial and Type methods of evaluation, by that I mean, I wanted the Partial evaluation but only if the Type was correct.
So my aproach was to get the results_indices from both lists with the following code:
partial_cor = result_indices['partial']['correct_indices']
partial_inc = result_indices['partial']['incorrect_indices']
partial_par = result_indices['partial']['partial_indices']
ent_type_cor = result_indices['ent_type']['correct_indices']
ent_type_inc = result_indices['ent_type']['incorrect_indices']
# All with Incorrect Type Stay Incorrect
mix_inc = ent_type_inc
# The ones with Correct Type are dived between Correct and Partial acording to the Partial Evaluation
mix_cor = [x for x in ent_type_cor if x in partial_cor]
mix_par = [x for x in ent_type_cor if x in partial_par]
I would then calculate the aggregated result by the Lenghts of the arrays.
I created a fake dataset to validate my algorithm:
test_dataset = [
# Entity and Type Corrects
{
"true": [{"label": "COR", "start": 0, "end": 10}],
"pred": [{"label": "COR", "start": 0, "end": 10}]
},
# Correct Entity and Incorrect Type
{
"true": [{"label": "COR", "start": 0, "end": 10}],
"pred": [{"label": "INC", "start": 0, "end": 10}]
},
# Partial Entity and Correct Type
{
"true": [{"label": "COR", "start": 0, "end": 10}],
"pred": [{"label": "COR", "start": 1, "end": 9}]
},
# Partial Entity and Incorrect Type
{
"true": [{"label": "COR", "start": 0, "end": 10}],
"pred": [{"label": "INC", "start": 1, "end": 9}]
}
]
true = [x['true'] for x in test_dataset]
pred = [x['pred'] for x in test_dataset]
I was expecting the following results:
"partial": {
"correct": 2,
"incorrect": 0,
"partial": 2
}
"ent_type": {
"correct": 2,
"incorrect": 2,
"partial": 0
}
"mix": {
"correct": 1,
"incorrect": 2,
"partial": 1
}
When I ran the code, I got what I expected for the aggregated results, but not for the result indices:
evaluator = Evaluator(true, pred, tags=['COR', 'INC'], loader="default")
results, results_per_tag, result_indices, result_indices_by_tag = evaluator.evaluate()
print("Type")
print(json.dumps(results['ent_type'], indent=4, ensure_ascii=False))
print("Partial")
print(json.dumps(results['partial'], indent=4, ensure_ascii=False))
>>>>
Type
{
"correct": 2,
"incorrect": 2,
"partial": 0,
"missed": 0,
"spurious": 0,
"possible": 4,
"actual": 4,
"precision": 0.5,
"recall": 0.5,
"f1": 0.5
}
Partial
{
"correct": 2,
"incorrect": 0,
"partial": 2,
"missed": 0,
"spurious": 0,
"possible": 4,
"actual": 4,
"precision": 0.75,
"recall": 0.75,
"f1": 0.75
}
But, for some reason, the indices did not match:
print("Type")
print(json.dumps(result_indices['ent_type'], indent=4, ensure_ascii=False))
print("Partial")
print(json.dumps(result_indices['partial'], indent=4, ensure_ascii=False))
Type
{
"correct_indices": [
[0, 0]
],
"incorrect_indices": [
[1, 0],
[3, 0]
],
"partial_indices": [],
"missed_indices": [],
"spurious_indices": []
}
Partial
{
"correct_indices": [
[0, 0]
],
"incorrect_indices": [],
"partial_indices": [
[2, 0],
[3, 0]
],
"missed_indices": [],
"spurious_indices": []
}
It really bugged me, because the Type shoud have added the index [2, 0] to the correct_indices, and the Partial should have added the index [1, 0] to the correct_indices. If this entities are counted in the aggregated vision, they should be in the indices vision, right?
I apologize if I am missing something obvious. I would appreciate any help understanding this outcome.
I would also appreciate any help to find another way to achive my desired result using this library?
hey @rodrigues-pedro, thanks so much for taking the time to flag this. I added the indices results feature. I just saw this issue by chance - I’m tight for time at the moment but I will look into this as soon as I can.
hey @rodrigues-pedro sorry for such a delay but better late than never😅
I have tried to replicate your problem but it now seems to be solved already (nice job @davidsbatista !):
def test_evaluation_type_merge():
test_dataset = [
# Entity and Type Corrects
{
"true": [{"label": "COR", "start": 0, "end": 10}],
"pred": [{"label": "COR", "start": 0, "end": 10}]
},
# Correct Entity and Incorrect Type
{
"true": [{"label": "COR", "start": 0, "end": 10}],
"pred": [{"label": "INC", "start": 0, "end": 10}]
},
# Partial Entity and Correct Type
{
"true": [{"label": "COR", "start": 0, "end": 10}],
"pred": [{"label": "COR", "start": 1, "end": 9}]
},
# Partial Entity and Incorrect Type
{
"true": [{"label": "COR", "start": 0, "end": 10}],
"pred": [{"label": "INC", "start": 1, "end": 9}]
}
]
true = [x['true'] for x in test_dataset]
pred = [x['pred'] for x in test_dataset]
evaluator = Evaluator(true, pred, tags=['COR', 'INC'], loader="default")
results = evaluator.evaluate()
# Aggregated
results['overall']['ent_type']
# EvaluationResult(correct=2, incorrect=2, partial=0, missed=0, spurious=0, precision=0.5, recall=0.5, f1=0.5, actual=4, possible=4)
results['overall']['partial']
# EvaluationResult(correct=2, incorrect=0, partial=2, missed=0, spurious=0, precision=0.5, recall=0.5, f1=0.5, actual=4, possible=4)
# Indices
results['overall_indices']['ent_type']
# EvaluationIndices(correct_indices=[(0, 0), (2, 0)], incorrect_indices=[(1, 0), (3, 0)], partial_indices=[], missed_indices=[], spurious_indices=[])
results['overall_indices']['partial']
# EvaluationIndices(correct_indices=[(0, 0), (1, 0)], incorrect_indices=[], partial_indices=[(2, 0), (3, 0)], missed_indices=[], spurious_indices=[])
seems this is solved on the new release, I will close this issue.