federated icon indicating copy to clipboard operation
federated copied to clipboard

Is it possible to find Precision and Recall of multi-class classification using default metrics?

Open Nawrin14 opened this issue 2 years ago • 2 comments

I am trying to find the accuracy, precision, and recall for a multi-class classification model. The model is training and the accuracy is increasing in each round. However, the accuracy, precision, and recall values are always similar. The values are similar for the final global model also. Usually, accuracy, precision, and recall values are not the same. But I am getting the same value for the 3 metrics. What could be the reason for that? Do the precision and recall not work properly for multi-class classification?

def create_tff_model():
  return tff.learning.from_keras_model(build_model(), 
                                       input_spec=train_datasets[0].element_spec,
                                       loss=tf.keras.losses.CategoricalCrossentropy(),
                                       metrics=[tf.keras.metrics.CategoricalAccuracy(), tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

Here's the output after running the model for 5 rounds

metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9552976), ('precision', 0.971948), ('recall', 0.9260119), ('loss', 0.13934681), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9877381), ('precision', 0.9877381), ('recall', 0.9877381), ('loss', 0.030179854), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9960119), ('precision', 0.9960119), ('recall', 0.9960119), ('loss', 0.014143298), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9997619), ('precision', 0.9997619), ('recall', 0.9997619), ('loss', 0.0044228216), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.99994045), ('precision', 0.99994045), ('recall', 0.99994045), ('loss', 0.001096289), ('num_examples', 16800), ('num_batches', 3360)]))])

Here's the output for the final global model

OrderedDict([('eval', OrderedDict([('categorical_accuracy', 0.92105263), ('precision', 0.92105263), ('recall', 0.92105263), ('loss', 0.43225765), ('num_examples', 5700), ('num_batches', 1140)]))])

Nawrin14 avatar May 14 '22 06:05 Nawrin14

Hi @Nawrin14. This is certainly suspicious. Can you give a bit more detail? In particular, as part of our bug filing system we encourage you to submit the following:

  • Python package versions (e.g., TensorFlow Federated, TensorFlow):
  • Python version:
  • What TensorFlow Federated execution stack are you using?

Additionally, can you give a bit more info on the datasets and the build_model function being used? In particular, any kind of minimal reproduction of this (potentially in colab) would be extremely useful.

zcharles8 avatar May 18 '22 23:05 zcharles8

Hi @zcharles8. I am experimenting with the iris dataset. I am using Tensorflow Federated version 0.20.0 and Python version 3.7.13. I am attaching the colab link here -

https://colab.research.google.com/drive/1mWwCxWPL-QSYjnc9GryZNomlrJUP2VI7?usp=sharing

Nawrin14 avatar May 19 '22 12:05 Nawrin14