federated Is it possible to find Precision and Recall of multi-class classification using default metrics?

I am trying to find the accuracy, precision, and recall for a multi-class classification model. The model is training and the accuracy is increasing in each round. However, the accuracy, precision, and recall values are always similar. The values are similar for the final global model also. Usually, accuracy, precision, and recall values are not the same. But I am getting the same value for the 3 metrics. What could be the reason for that? Do the precision and recall not work properly for multi-class classification?

def create_tff_model():
  return tff.learning.from_keras_model(build_model(), 
                                       input_spec=train_datasets[0].element_spec,
                                       loss=tf.keras.losses.CategoricalCrossentropy(),
                                       metrics=[tf.keras.metrics.CategoricalAccuracy(), tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

Here's the output after running the model for 5 rounds

metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9552976), ('precision', 0.971948), ('recall', 0.9260119), ('loss', 0.13934681), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9877381), ('precision', 0.9877381), ('recall', 0.9877381), ('loss', 0.030179854), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9960119), ('precision', 0.9960119), ('recall', 0.9960119), ('loss', 0.014143298), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9997619), ('precision', 0.9997619), ('recall', 0.9997619), ('loss', 0.0044228216), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.99994045), ('precision', 0.99994045), ('recall', 0.99994045), ('loss', 0.001096289), ('num_examples', 16800), ('num_batches', 3360)]))])

Here's the output for the final global model

OrderedDict([('eval', OrderedDict([('categorical_accuracy', 0.92105263), ('precision', 0.92105263), ('recall', 0.92105263), ('loss', 0.43225765), ('num_examples', 5700), ('num_batches', 1140)]))])

May 14 '22 06:05 Nawrin14

Hi @Nawrin14. This is certainly suspicious. Can you give a bit more detail? In particular, as part of our bug filing system we encourage you to submit the following:

Python package versions (e.g., TensorFlow Federated, TensorFlow):
Python version:
What TensorFlow Federated execution stack are you using?

Additionally, can you give a bit more info on the datasets and the build_model function being used? In particular, any kind of minimal reproduction of this (potentially in colab) would be extremely useful.

May 18 '22 23:05 zcharles8

Hi @zcharles8. I am experimenting with the iris dataset. I am using Tensorflow Federated version 0.20.0 and Python version 3.7.13. I am attaching the colab link here -

https://colab.research.google.com/drive/1mWwCxWPL-QSYjnc9GryZNomlrJUP2VI7?usp=sharing

May 19 '22 12:05 Nawrin14

federated federated copied to clipboard

Is it possible to find Precision and Recall of multi-class classification using default metrics?

federated
federated copied to clipboard