sed_eval icon indicating copy to clipboard operation
sed_eval copied to clipboard

F1 metric is nan when should be 0

Open nikanar opened this issue 5 years ago • 6 comments

In a particular test, my system failed to produce any event of one of the reference classes (C). As expected, its recall is 0, and its precision NaN on that particular class (see below). Yet, the F1 score on that class should, IMHO, be 0 (yes pr/p+r is 0/0 fine but at a higher level it's a full miss, the denominator really is ε).

In turn the class-wise average should take that C class into account as 0, rather than as NaN as currently, and should be the average of A and C instead of just A.

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    | Sens     Spec     Bacc     Acc     
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ | ------   ------   ------   ------  
    A            | 37      30    | 74.6%    83.3%    67.6%  | 0.46     0.32     0.14   | 67.6%    97.2%    82.4%    92.1%   
    B            | 0       0     | nan%     nan%     nan%   | 0.00     0.00     0.00   | 0.0%     100.0%   50.0%    100.0%  
    C            | 33      0     | nan%     nan%     0.0%   | 1.00     1.00     0.00   | 0.0%     100.0%   50.0%    84.7%   
    D            | 0       0     | nan%     nan%     nan%   | 0.00     0.00     0.00   | 0.0%     100.0%   50.0%    100.0%  
  Class-wise average metrics (macro-average)
  ======================================
  F-measure
    F-measure (F1)                  : 74.63 %
    Precision                       : 83.33 %
    Recall                          : 33.78 %

nikanar avatar Sep 27 '18 16:09 nikanar

In this case, it seems like there is no clear answer about how the system should handle this case:

I tend to agree that in most cases, we want C class taken into account as 0. This is possible with: empty_system_output_handling='zero_score' parameter (but would be done for all classes right now).

However, the B and D class seems to be very dependent of the application right ?

  • You can argue not find a class you shouldn't find should be rewarded (100%) in class average, because it will be penalized if you have Nref=0 and Nsys > 0.
  • It is normal to have Nref=0 and Nsys=0 because it is a sub problem for example, so it shouldn't be taken into account.

The B and D case seem rather a bit more philosophical than code problem, any thoughts ?

turpaultn avatar Jun 12 '19 09:06 turpaultn

In this case, it seems like there is no clear answer about how the system should handle this case:

I tend to agree that in most cases, we want C class taken into account as 0. This is possible with: empty_system_output_handling='zero_score' parameter (but would be done for all classes right now).

However, the B and D class seems to be very dependent of the application right ?

  • You can argue not find a class you shouldn't find should be rewarded (100%) in class average, because it will be penalized if you have Nref=0 and Nsys > 0.
  • It is normal to have Nref=0 and Nsys=0 because it is a sub problem for example, so it shouldn't be taken into account.

The B and D case seem rather a bit more philosophical than code problem, any thoughts ?

Can you please guide me as to where I should add the empty_system_output_handling='zero_score' attribute ?

zhordiffallah avatar Sep 17 '21 15:09 zhordiffallah

You can set this parameter when you create evaluator object:

  event_based_metric_zero = sed_eval.sound_event.EventBasedMetrics(
      event_label_list=['event A', 'event B'],
      t_collar=0.200,
      percentage_of_length=0.2,
      empty_system_output_handling='zero_score'
  )

toni-heittola avatar Sep 17 '21 20:09 toni-heittola

You can set this parameter when you create evaluator object:

  event_based_metric_zero = sed_eval.sound_event.EventBasedMetrics(
      event_label_list=['event A', 'event B'],
      t_collar=0.200,
      percentage_of_length=0.2,
      empty_system_output_handling='zero_score'
  )

Thank you for your reply. However my problem is in Segment Based Evaluation, and I have tried to set the empty_system_output_handling='zero_score' when creating the segment based evaluator without any success. Here is a screenshot of my results Capture d’écran (42) .

zhordiffallah avatar Sep 17 '21 20:09 zhordiffallah

OK, it seems that there was a bug in the code. This parameter was not properly set for SegmentBasedMetric, only for EventBasedMetric. It is now fixed with commit dd266286151dd779f743005036ad64d48c3c4692, which is available in develop branch.

toni-heittola avatar Sep 17 '21 20:09 toni-heittola

OK, it seems that there was a bug in the code. This parameter was not properly set for SegmentBasedMetric, only for EventBasedMetric. It is now fixed with commit dd26628, which is available in develop branch.

Thank you so much for your help !

zhordiffallah avatar Sep 17 '21 20:09 zhordiffallah