federated
federated copied to clipboard
Is it possible to find Precision and Recall of multi-class classification using default metrics?
I am trying to find the accuracy, precision, and recall for a multi-class classification model. The model is training and the accuracy is increasing in each round. However, the accuracy, precision, and recall values are always similar. The values are similar for the final global model also. Usually, accuracy, precision, and recall values are not the same. But I am getting the same value for the 3 metrics. What could be the reason for that? Do the precision and recall not work properly for multi-class classification?
def create_tff_model():
return tff.learning.from_keras_model(build_model(),
input_spec=train_datasets[0].element_spec,
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=[tf.keras.metrics.CategoricalAccuracy(), tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])
Here's the output after running the model for 5 rounds
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9552976), ('precision', 0.971948), ('recall', 0.9260119), ('loss', 0.13934681), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9877381), ('precision', 0.9877381), ('recall', 0.9877381), ('loss', 0.030179854), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9960119), ('precision', 0.9960119), ('recall', 0.9960119), ('loss', 0.014143298), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.9997619), ('precision', 0.9997619), ('recall', 0.9997619), ('loss', 0.0044228216), ('num_examples', 16800), ('num_batches', 3360)]))])
metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.99994045), ('precision', 0.99994045), ('recall', 0.99994045), ('loss', 0.001096289), ('num_examples', 16800), ('num_batches', 3360)]))])
Here's the output for the final global model
OrderedDict([('eval', OrderedDict([('categorical_accuracy', 0.92105263), ('precision', 0.92105263), ('recall', 0.92105263), ('loss', 0.43225765), ('num_examples', 5700), ('num_batches', 1140)]))])
Hi @Nawrin14. This is certainly suspicious. Can you give a bit more detail? In particular, as part of our bug filing system we encourage you to submit the following:
- Python package versions (e.g., TensorFlow Federated, TensorFlow):
- Python version:
- What TensorFlow Federated execution stack are you using?
Additionally, can you give a bit more info on the datasets and the build_model
function being used? In particular, any kind of minimal reproduction of this (potentially in colab) would be extremely useful.
Hi @zcharles8. I am experimenting with the iris dataset. I am using Tensorflow Federated version 0.20.0 and Python version 3.7.13. I am attaching the colab link here -
https://colab.research.google.com/drive/1mWwCxWPL-QSYjnc9GryZNomlrJUP2VI7?usp=sharing