addons tfa.metrics.F1Score gives ValueError for binary classification with shape (n,)

System information

OS: Ubuntu 18.04.3 LTS (Bionic Beaver)
TensorFlow: 2.0.0 via tf docker image ('v2.0.0-rc2-26-g64c3d38')
TensorFlow-Addons: 0.6.0 via pip
Python version: 3.6.8 [GCC8.3.0] on linux
Is GPU used? (yes/no): tried both

Describe the bug tfa.metrics.F1Score gives a ValueError for a binary classification problem encoded with shape (n,).

ValueError: Shapes must be equal rank, but are 1 and 0 for 'AssignAddVariableOp' (op: 'AssignAddVariableOp') with input shapes: [], [].

Code to reproduce the issue

import tensorflow as tf
import tensorflow_addons as tfa

f1 = tfa.metrics.F1Score(num_classes=2, average=None)

y_true = tf.constant([1,0,0,1,0])
y_pred = tf.constant([1,1,0,0,1])
f1.update_state(y_true, y_pred)

Other info / logs Would expect this input to work, as it does with tfa.metrics.CohensKappa. If it walks like a tfa.metric and quacks like a tfa.metric, it is a tfa.metric.

E.g

import tensorflow as tf
import tensorflow_addons as tfa

k = tfa.metrics.CohenKappa(num_classes=2)

y_true = tf.constant([1,0,0,1,0])
y_pred = tf.constant([1,1,0,0,1])
k.update_state(y_true, y_pred)

yields

<tf.Tensor: id=196, shape=(2, 2), dtype=int32, numpy=
array([[1, 2], 
       [1, 1]], dtype=int32)>

Dec 07 '19 18:12 MadsAdrian

Let me reproduce and see how it goes

Dec 07 '19 20:12 SSaishruthi

@MadsAdrian

Thanks. Computation is different for both metrics.

In F1, number of columns in your input is equal to the number of classes. Your input shape is (1,5) so the num_classes parameter should have the value 5.
y_pred expects float input data type.
Providing the shape of your input will resolve this issue.

Please see the below example.

f1 = F1Score(num_classes=5, average=None)

y_true = tf.constant([1,0,0,1,0], shape=(1,5))
y_pred = tf.constant([1,1,0,0,1], shape=(1,5), dtype=tf.float32)
f1.update_state(y_true, y_pred)
print('F1 Score is: ', f1.result().numpy())

Let me know for any clarifications

Dec 07 '19 20:12 SSaishruthi

Thanks for the clarifying response. This at least allows a workaround in my code. Although, I don't think you fully understand my inquiry. As stated in the original post, I would expect duck typing to work across the tfa metrics where a binary vector with shape (n,) makes sense. My case is actually num_classes=2, and I assumed that inputs of shape (n,) was acceptable (as is the case with tfa.metrics.CohenKappa).

With regards to #430, I suppose the migrated class should be called CategoricalF1Score. See tf.metrics.CategoricalAccuracy and tf.metrics.BinaryAccuracy, which highlights the distinction which has not been made for tfa.metrics.F1Score.

I suppose

The number of columns in the input should be equal to the number of classes, i.e. y_true and y_pred should be one hot encoded on axis=1.

should be part of the documentation?

The code I'm using is like this, but as you can see, I did not get it to work with F1Score as it is truly a CategoricalF1Score.

self.change_map_metrics = {
    "ACC": tf.keras.metrics.Accuracy(),
    "cohens kappa": CohenKappa(num_classes=2),
    # "F1": tfa.metrics.F1Score(num_classes=2, average=None),
}
...
# y_true and y_pred are (1,h,w,1) images in {0,1}
y_true, y_pred = tf.reshape(y_true, [-1]), tf.reshape(y_pred, [-1])
for name, metric in self.change_map_metrics.items():
    metric.update_state(y_true, y_pred)

Dec 08 '19 15:12 MadsAdrian

Further does

import tensorflow as tf
import tensorflow_addons as tfa
t, p = [1,1,0,1,1,0], [1,0,1,1,0,1]
y_true = tf.constant(t, shape=(6,1))
y_pred = tf.constant(p, shape=(6,1), dtype=tf.float32)

m = tfa.metrics.F1Score(num_classes=2, average=None)
m.update_state(y_true, y_pred)

(as expected) produce a ValueError, as y_true and y_pred are not one-hot encoded on axis=1.

oh_y_true = tf.one_hot(t, depth=2)
oh_y_pred = tf.one_hot(p, depth=2)

m.update_state(oh_y_true, oh_y_pred)
m.result() => <tf.Tensor: id=102, shape=(2,), dtype=float32, numpy=array([0. , 0.5], dtype=float32)>

which is as expected with average=None.

The behavior I initially expected is like

from sklearn.metrics import f1_score
f1_score(t,p) => 0.5

Dec 08 '19 20:12 MadsAdrian

To sum up: there should be CategoricalF1Score name and the num_classes argument should be eliminated. BinaryF1Score and F1Score might be another additions.

Feb 10 '20 16:02 failure-to-thrive

I think you are right... tf.metrics has e.g. tf.metrics.Accuracy, tf.metrics.BinaryAccuracy and tf.metrics.CategoricalAccuracy. I think this pattern should be mirrored for tfa.metrics.*F1.

As I understand the tf docs:

tf.metrics.Accuracy returns the binary accuracy on a categorical vector
tf.metrics.BinaryAccuracy return the binary accuracy on a vector of thresholded floats in [0,1]
tf.metrics.CategoricalAccuracy returns the categorical accuracy on a psudo-probability one-hot matrix with floats in [0,1]

It's been too long since I opened this issue, so I'm not quite certain which is which... I guess the tfa.metrics.BinaryF1Score was the one I needed, per my example from sklearn.metrics.f1_score. From the scheme of tf.metrics, I think tfa.metrics.F1Score should be named tfa.metrics.CategoricalF1Score, but I'm certain.

If we can figure out the names and patterns together, I'm happy to contribute to solving this issue and streamline these metrics towards the similar tf.metrics.

Feb 11 '20 13:02 MadsAdrian

I am running into this issue currently as well, trying to see if I can find any workaround in my own situation, or whether I will have to abandon using F1 score and just make my own function or manually calculate it whenever I need it (which are all very suboptimal clearly)

Mar 23 '20 10:03 mgmverburg

I'm running into a related issue. Whenever I have multi-class predictions processed through one-hot and computed with softmax, I can calculate categorical accuracy with tf.metrics.CategoricalAccuracy, but I cannot calculate F1score since their input shapes are not compatible (tf.metrics.CategoricalAccuracy requires rank 2 and F1score only supports rank 1).

import tensorflow as tf
import tensorflow_addons as tfa

y_true = tf.constant([[[1,0,0],[0,0,1]],[[1,0,0],[1,0,0]]])
y_pred = tf.constant([[[1.,0.,0.],[0.,0.4,0.6]],[[0.8,0.2,0.],[0.55,0.45,0.]]])

f1 = tfa.metrics.F1Score(num_classes=6, average=None)
f1.update_state(y_true, y_pred)
f1.result()

This throws me the error:

InvalidArgumentError: Cannot update variable with shape [6] using a Tensor with shape [2,3], shapes must be equal. [Op:AssignAddVariableOp]

Even if I apply reshape or flatten the data, then F1score would work but CategoricalAccuracy would not work anymore, so I cannot use them simultaneously. I would appreciate if we could provide a shape argument to the F1score in addition to the num_classes.

May 22 '20 22:05 raqueldias

A possible workaround for using tfa.metrics.F1Score for binary classification without one-hot encoding is to set num_classes to 1 and threshold to 0.5, i.e.

tfa.metrics.F1Score(num_classes=1, threshold=0.5)

Using the example from @MadsAdrian we get

import tensorflow as tf
import tensorflow_addons as tfa
t, p = [1,1,0,1,1,0], [1,0,1,1,0,1]
y_true = tf.constant(t, shape=(6,1))
y_pred = tf.constant(p, shape=(6,1), dtype=tf.float32)

m = tfa.metrics.F1Score(num_classes=1, threshold=0.5)
m.update_state(y_true, y_pred)
m.result() => <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.5], dtype=float32)>

Jun 14 '20 17:06 ffraaz

Thanks for the suggestion! But that workaround wouldn't work for macro or weighted F1-scores, isn't it?

Jul 24 '20 05:07 raqueldias

I have this problem too. I want to use the weighted F1Score on a 2-class problem with binary_crossentropy as loss (no softmaxed output), so the suggestion in from ffraaz is not helpful in my case.

A possible workaround for using tfa.metrics.F1Score for binary classification without one-hot encoding is to set num_classes to 1 and threshold to 0.5, i.e.
tfa.metrics.F1Score(num_classes=1, threshold=0.5)
Using the example from @MadsAdrian we get
import tensorflow as tf
import tensorflow_addons as tfa
t, p = [1,1,0,1,1,0], [1,0,1,1,0,1]
y_true = tf.constant(t, shape=(6,1))
y_pred = tf.constant(p, shape=(6,1), dtype=tf.float32)

m = tfa.metrics.F1Score(num_classes=1, threshold=0.5)
m.update_state(y_true, y_pred)
m.result() => <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.5], dtype=float32)>

My ouput is: ValueError: Shapes must be equal rank, but are 1 and 3 for 'AssignAddVariableOp' (op: 'AssignAddVariableOp') with input shapes: [], [2,#height,#width].

Aug 04 '20 15:08 davideboschetto

Thanks for the suggestion! But that workaround wouldn't work for macro or weighted F1-scores, isn't it?

Indeed, for a binary classification task I'm using num_classes=1 and only average='micro' works.

Apr 20 '21 05:04 MasterJEET

Gives an error when passing it for Model Checkpoint callback as a monitor metric

Jan 23 '23 15:01 ma7555

addons addons copied to clipboard

tfa.metrics.F1Score gives ValueError for binary classification with shape (n,)

addons
addons copied to clipboard