addons
addons copied to clipboard
tfa.metrics.F1Score gives ValueError for binary classification with shape (n,)
System information
- OS: Ubuntu 18.04.3 LTS (Bionic Beaver)
- TensorFlow:
2.0.0via tf docker image ('v2.0.0-rc2-26-g64c3d38') - TensorFlow-Addons:
0.6.0via pip - Python version: 3.6.8 [GCC8.3.0] on linux
- Is GPU used? (yes/no): tried both
Describe the bug
tfa.metrics.F1Score gives a ValueError for a binary classification problem encoded with shape (n,).
ValueError: Shapes must be equal rank, but are 1 and 0 for 'AssignAddVariableOp' (op: 'AssignAddVariableOp') with input shapes: [], [].
Code to reproduce the issue
import tensorflow as tf
import tensorflow_addons as tfa
f1 = tfa.metrics.F1Score(num_classes=2, average=None)
y_true = tf.constant([1,0,0,1,0])
y_pred = tf.constant([1,1,0,0,1])
f1.update_state(y_true, y_pred)
Other info / logs
Would expect this input to work, as it does with tfa.metrics.CohensKappa. If it walks like a tfa.metric and quacks like a tfa.metric, it is a tfa.metric.
E.g
import tensorflow as tf
import tensorflow_addons as tfa
k = tfa.metrics.CohenKappa(num_classes=2)
y_true = tf.constant([1,0,0,1,0])
y_pred = tf.constant([1,1,0,0,1])
k.update_state(y_true, y_pred)
yields
<tf.Tensor: id=196, shape=(2, 2), dtype=int32, numpy=
array([[1, 2],
[1, 1]], dtype=int32)>
Let me reproduce and see how it goes
@MadsAdrian
Thanks. Computation is different for both metrics.
-
In F1, number of columns in your input is equal to the number of classes. Your input shape is (1,5) so the
num_classesparameter should have the value 5. -
y_predexpects float input data type. -
Providing the shape of your input will resolve this issue.
Please see the below example.
f1 = F1Score(num_classes=5, average=None)
y_true = tf.constant([1,0,0,1,0], shape=(1,5))
y_pred = tf.constant([1,1,0,0,1], shape=(1,5), dtype=tf.float32)
f1.update_state(y_true, y_pred)
print('F1 Score is: ', f1.result().numpy())
Let me know for any clarifications
Thanks for the clarifying response. This at least allows a workaround in my code.
Although, I don't think you fully understand my inquiry. As stated in the original post, I would expect duck typing to work across the tfa metrics where a binary vector with shape (n,) makes sense. My case is actually num_classes=2, and I assumed that inputs of shape (n,) was acceptable (as is the case with tfa.metrics.CohenKappa).
With regards to #430, I suppose the migrated class should be called CategoricalF1Score. See tf.metrics.CategoricalAccuracy and tf.metrics.BinaryAccuracy, which highlights the distinction which has not been made for tfa.metrics.F1Score.
I suppose
The number of columns in the input should be equal to the number of classes, i.e.
y_trueandy_predshould be one hot encoded onaxis=1.
should be part of the documentation?
The code I'm using is like this, but as you can see, I did not get it to work with F1Score as it is truly a CategoricalF1Score.
self.change_map_metrics = {
"ACC": tf.keras.metrics.Accuracy(),
"cohens kappa": CohenKappa(num_classes=2),
# "F1": tfa.metrics.F1Score(num_classes=2, average=None),
}
...
# y_true and y_pred are (1,h,w,1) images in {0,1}
y_true, y_pred = tf.reshape(y_true, [-1]), tf.reshape(y_pred, [-1])
for name, metric in self.change_map_metrics.items():
metric.update_state(y_true, y_pred)
Further does
import tensorflow as tf
import tensorflow_addons as tfa
t, p = [1,1,0,1,1,0], [1,0,1,1,0,1]
y_true = tf.constant(t, shape=(6,1))
y_pred = tf.constant(p, shape=(6,1), dtype=tf.float32)
m = tfa.metrics.F1Score(num_classes=2, average=None)
m.update_state(y_true, y_pred)
(as expected) produce a ValueError, as y_true and y_pred are not one-hot encoded on axis=1.
oh_y_true = tf.one_hot(t, depth=2)
oh_y_pred = tf.one_hot(p, depth=2)
m.update_state(oh_y_true, oh_y_pred)
m.result() => <tf.Tensor: id=102, shape=(2,), dtype=float32, numpy=array([0. , 0.5], dtype=float32)>
which is as expected with average=None.
The behavior I initially expected is like
from sklearn.metrics import f1_score
f1_score(t,p) => 0.5
To sum up: there should be CategoricalF1Score name and the num_classes argument should be eliminated. BinaryF1Score and F1Score might be another additions.
I think you are right... tf.metrics has e.g.
tf.metrics.Accuracy, tf.metrics.BinaryAccuracy and tf.metrics.CategoricalAccuracy. I think this pattern should be mirrored for tfa.metrics.*F1.
As I understand the tf docs:
tf.metrics.Accuracyreturns the binary accuracy on a categorical vectortf.metrics.BinaryAccuracyreturn the binary accuracy on a vector of thresholded floats in [0,1]tf.metrics.CategoricalAccuracyreturns the categorical accuracy on a psudo-probability one-hot matrix with floats in [0,1]
It's been too long since I opened this issue, so I'm not quite certain which is which... I guess the tfa.metrics.BinaryF1Score was the one I needed, per my example from sklearn.metrics.f1_score. From the scheme of tf.metrics, I think tfa.metrics.F1Score should be named tfa.metrics.CategoricalF1Score, but I'm certain.
If we can figure out the names and patterns together, I'm happy to contribute to solving this issue and streamline these metrics towards the similar tf.metrics.
I am running into this issue currently as well, trying to see if I can find any workaround in my own situation, or whether I will have to abandon using F1 score and just make my own function or manually calculate it whenever I need it (which are all very suboptimal clearly)
I'm running into a related issue. Whenever I have multi-class predictions processed through one-hot and computed with softmax, I can calculate categorical accuracy with tf.metrics.CategoricalAccuracy, but I cannot calculate F1score since their input shapes are not compatible (tf.metrics.CategoricalAccuracy requires rank 2 and F1score only supports rank 1).
import tensorflow as tf
import tensorflow_addons as tfa
y_true = tf.constant([[[1,0,0],[0,0,1]],[[1,0,0],[1,0,0]]])
y_pred = tf.constant([[[1.,0.,0.],[0.,0.4,0.6]],[[0.8,0.2,0.],[0.55,0.45,0.]]])
f1 = tfa.metrics.F1Score(num_classes=6, average=None)
f1.update_state(y_true, y_pred)
f1.result()
This throws me the error:
InvalidArgumentError: Cannot update variable with shape [6] using a Tensor with shape [2,3], shapes must be equal. [Op:AssignAddVariableOp]
Even if I apply reshape or flatten the data, then F1score would work but CategoricalAccuracy would not work anymore, so I cannot use them simultaneously. I would appreciate if we could provide a shape argument to the F1score in addition to the num_classes.
A possible workaround for using tfa.metrics.F1Score for binary classification without one-hot encoding is to set num_classes to 1 and threshold to 0.5, i.e.
tfa.metrics.F1Score(num_classes=1, threshold=0.5)
Using the example from @MadsAdrian we get
import tensorflow as tf
import tensorflow_addons as tfa
t, p = [1,1,0,1,1,0], [1,0,1,1,0,1]
y_true = tf.constant(t, shape=(6,1))
y_pred = tf.constant(p, shape=(6,1), dtype=tf.float32)
m = tfa.metrics.F1Score(num_classes=1, threshold=0.5)
m.update_state(y_true, y_pred)
m.result() => <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.5], dtype=float32)>
Thanks for the suggestion! But that workaround wouldn't work for macro or weighted F1-scores, isn't it?
I have this problem too. I want to use the weighted F1Score on a 2-class problem with binary_crossentropy as loss (no softmaxed output), so the suggestion in from ffraaz is not helpful in my case.
A possible workaround for using
tfa.metrics.F1Scorefor binary classification without one-hot encoding is to setnum_classesto 1 andthresholdto 0.5, i.e.tfa.metrics.F1Score(num_classes=1, threshold=0.5)Using the example from @MadsAdrian we get
import tensorflow as tf import tensorflow_addons as tfa t, p = [1,1,0,1,1,0], [1,0,1,1,0,1] y_true = tf.constant(t, shape=(6,1)) y_pred = tf.constant(p, shape=(6,1), dtype=tf.float32) m = tfa.metrics.F1Score(num_classes=1, threshold=0.5) m.update_state(y_true, y_pred) m.result() => <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.5], dtype=float32)>
My ouput is: ValueError: Shapes must be equal rank, but are 1 and 3 for 'AssignAddVariableOp' (op: 'AssignAddVariableOp') with input shapes: [], [2,#height,#width].
Thanks for the suggestion! But that workaround wouldn't work for macro or weighted F1-scores, isn't it?
Indeed, for a binary classification task I'm using num_classes=1 and only average='micro' works.
Gives an error when passing it for Model Checkpoint callback as a monitor metric