tensorflow tf.keras.metrics.MeanIoU API is practically unusable without a threshold

trafficstars

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): pip3
TensorFlow version (use command below): 2.1.0
Python version: 3.6
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

Describe the current behavior

tf.keras.metrics.MeanIoU's constructor implementation does not take a threshold or list of thresholds as input argument. This is not only inconsistent the API used by other metrics (e.g. tf.keras.metrics.TruePositives, tf.keras.metrics.FalseNegatives) but also renders the API practically unusable because the outputs (i.e. predictions) from a network would generally be probability values in range from 0 to 1 and not a perfect 0 or 1 values. Hence, unless the constructor takes thresholds as argument and applies it to predictions before computing IoU, it is practically useless. It would always end up showing 0.5, or 0.25 or whatever the baseline random guess IOU happens to be in a given problem.

Describe the expected behavior

tf.keras.metrics.MeanIoU constructor should take threshold values as input and also apply those before computing the IoU.

Standalone code to reproduce the issue None required because the docs https://www.tensorflow.org/api_docs/python/tf/keras/metrics/MeanIoU proves the point where it only shows a example where preds are already binary values.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

May 05 '20 00:05 dsuthar-nvidia

@dd1923, On running the usage example given in the MeanIoU documentation, the output I got was similar to the example. Please find the gist of it here.

Could you please provide a minimal code sample to reproduce the issue reported here. Thanks!

May 05 '20 14:05 amahendrakar

@amahendrakar That was my point. Did you read my post above? Mean IOU, implemented as a metric, is pretty much unusable without a threshold. You can try replacing 1 values in pred to 0.9 in your example and see the output.

May 05 '20 15:05 dsuthar-nvidia

Was able to reproduce the issue with TF v2.1, TF v2.2.0-rc4 and TF-nightly. Please find the attached gist. Thanks!

May 06 '20 07:05 amahendrakar

@pavithrasv any idea on when will this be addressed? Seems like a minor change on API side but it carries big impact on usability.

May 15 '20 19:05 dsuthar-nvidia

@dd1923 do you need threshold or something like argmax that choose the index with maximum probability?

Jul 07 '20 16:07 tanzhenyu

Need threshold to stay consistent with the rest of the API. See TruePositives, FalsePositives etc implementations https://www.tensorflow.org/api_docs/python/tf/keras/metrics/TruePositives?hl=TR

Jul 07 '20 16:07 dsuthar-nvidia

Need threshold to stay consistent with the rest of the API. See TruePositives, FalsePositives etc implementations https://www.tensorflow.org/api_docs/python/tf/keras/metrics/TruePositives?hl=TR

I'm not sure this can be consistent -- it looks like TruePositives requires y_pred to be probabilities, i.e., [batch_size, HW, n_classes], which is where threshold makes sense to label the output as either 0 or 1. Meanwhile MeanIOU requires y_pred to be predicted class id, i.e., [batch_size, HW]

I'd rather think the right way is allow this metrics to accept y_pred as probabilities and do argmax under the hood.

Jul 07 '20 16:07 tanzhenyu

First, at-least the docs of TruePositives and MeanIoU refer y_pred as y_pred | The predicted values.

Second, on the docs it says IOU is defined as follows: IOU = true_positive / (true_positive + false_positive + false_negative) so it seems logical to me to think that if all the inputs of IoU function (namely true positives, false_positives and false negatives) work on thresholds then the resulting metric would also work on thresholds.

Third, how would argmax handle the case of having 1 in more than one output classes?

I think something along the lines of the following in the update_state method would do the trick, right before calling current implementation of update_state (for atleast the case where threshold is one value):

y_pred = tf.where(condition=tf.math.greater(y_pred, tf.cast(threshold, y_pred.dtype)), x=tf.cast(1.0, y_pred.dtype), y=y_pred)

y_pred = tf.where(condition=tf.math.less_equal(y_pred, tf.cast(threshold, y_pred.dtype)), x=tf.cast(0.0, y_pred.dtype), y=y_pred)

Jul 07 '20 17:07 dsuthar-nvidia

Ok you want multilabel support, in that case having threshold makes sense

Jul 07 '20 17:07 tanzhenyu

@dd1923 Sorry about the delay. Will you be interested in sending a PR for this with test cases? I'll be happy to review and merge the change.

Aug 26 '20 00:08 pavithrasv

A potential workaround I found on this stack overflow (https://stackoverflow.com/questions/60507120/how-to-correctly-use-the-tensorflow-meaniou-metric):

class MyMeanIOU(tf.keras.metrics.MeanIoU):

def update_state(self, y_true, y_pred, sample_weight=None):
    return super().update_state(y_true, tf.argmax(y_pred, axis=-1), sample_weight)

In this case my y_true was a mask of shape batch,256,256,1 where the pixel values in the last dimension were 0,1 or 2. Then my y_pred was shape batch,256,256,3. This way the argmax takes from probability -> class value. Hope that helps!

Oct 05 '20 18:10 grantbrewster

@pavithrasv I've opened #47410 , could you take a look and share your thoughts on the best way to make the changes to the meanIoU() interface?

Feb 25 '21 20:02 TylerADavis

Hi guys, I got a new issue with MeanIoU. I am using TF 2.8 and python 3.9, OS windows10. I am running a simple code for semantic segmentation (3 classes, background: 0, green objects: 1, red objects: 2). I also used y_train = to_categorical(y_train, num_classes=3) The metrics show that the model is learning, but the IoU is always 0.3342, while simply by checking the tp, fp, fn, the IoU should be different. Could you please let me know what I am missing?

Epoch 1/20
47/47 - 39s - loss: 1.1312 - tp: 15429676.0000 - fp: 14302729.0000 - tn: 84006816.0000 - fn: 34110008.0000 - mean_io_u: 0.3342 - val_loss: 1.0479 - val_tp: 93273.0000 - val_fp: 13260.0000 - val_tn: 42716212.0000 - val_fn: 21271462.0000 - val_mean_io_u: 0.3333 - 39s/epoch - 820ms/step
Epoch 2/20
47/47 - 27s - loss: 0.8477 - tp: 24225640.0000 - fp: 8459446.0000 - tn: 88540744.0000 - fn: 24657304.0000 - mean_io_u: 0.3342 - val_loss: 0.9835 - val_tp: 375734.0000 - val_fp: 24419.0000 - val_tn: 42705056.0000 - val_fn: 20989004.0000 - val_mean_io_u: 0.3333 - 27s/epoch - 575ms/step
Epoch 3/20
47/47 - 25s - loss: 0.6144 - tp: 33600268.0000 - fp: 4546788.0000 - tn: 92456352.0000 - fn: 15279731.0000 - mean_io_u: 0.3342 - val_loss: 0.7825 - val_tp: 3682112.0000 - val_fp: 62991.0000 - val_tn: 42666480.0000 - val_fn: 17682624.0000 - val_mean_io_u: 0.3333 - 25s/epoch - 531ms/step
Epoch 4/20
47/47 - 23s - loss: 0.4546 - tp: 39932232.0000 - fp: 2521113.0000 - tn: 94480768.0000 - fn: 8949028.0000 - mean_io_u: 0.3342 - val_loss: 0.5884 - val_tp: 16064998.0000 - val_fp: 105517.0000 - val_tn: 42623952.0000 - val_fn: 5299738.0000 - val_mean_io_u: 0.3333 - 23s/epoch - 484ms/step
Epoch 5/20
47/47 - 28s - loss: 0.3440 - tp: 43877272.0000 - fp: 1494296.0000 - tn: 95507416.0000 - fn: 5004135.0000 - mean_io_u: 0.3342 - val_loss: 0.3385 - val_tp: 20455492.0000 - val_fp: 135860.0000 - val_tn: 42593616.0000 - val_fn: 909244.0000 - val_mean_io_u: 0.3333 - 28s/epoch - 605ms/step
Epoch 6/20
47/47 - 27s - loss: 0.2719 - tp: 45860256.0000 - fp: 975254.0000 - tn: 96026592.0000 - fn: 3021046.0000 - mean_io_u: 0.3342 - val_loss: 0.2615 - val_tp: 20886848.0000 - val_fp: 140094.0000 - val_tn: 42589384.0000 - val_fn: 477889.0000 - val_mean_io_u: 0.3333 - 27s/epoch - 567ms/step
Epoch 7/20

        X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size= 0.3, random_state=0)

        data_gen_args = dict(horizontal_flip=True,
                                          rotation_range=20,
                                          vertical_flip=True)
        image_datagen = ImageDataGenerator(**data_gen_args)
        mask_datagen = ImageDataGenerator(**data_gen_args)

        seed = 1
        bs = 16
        image_generator = image_datagen.flow(X_train, seed=seed, batch_size=bs, shuffle=True)
        mask_generator = mask_datagen.flow(y_train, seed=seed, batch_size=bs, shuffle=True)
        train_generator = zip(image_generator, mask_generator)

        METRICS = [
                keras.metrics.TruePositives(name='tp'),
                keras.metrics.FalsePositives(name='fp'),
                keras.metrics.TrueNegatives(name='tn'),
                keras.metrics.FalseNegatives(name='fn'),
                keras.metrics.MeanIoU(num_classes= 3)
            ]

        input_shape = ((args.input_size, args.input_size, 3))        
        model = DeepLabV3Plus(input_shape)

        opt = tf.keras.optimizers.Adam(learning_rate=0.001)
        model.compile(loss='categorical_crossentropy', optimizer= opt, metrics=METRICS)

        history = model.fit(train_generator,
                            verbose=2,
                            epochs=args.epochs,
                            steps_per_epoch=(len(X_train) // bs),
                            validation_data=(X_val, y_val),
                            shuffle=False)

Sep 29 '22 20:09 HTGorji

I have solved this problem by changing MeanIoU with OneHotIoU

Sep 29 '22 21:09 HTGorji

I have solved this problem by changing MeanIoU with OneHotIoU

would you mind contributing this in KerasCV instead?

Sep 29 '22 21:09 tanzhenyu

I have solved this problem by changing MeanIoU with OneHotIoU

would you mind contributing this in KerasCV instead?

How can I do that?

Sep 29 '22 21:09 HTGorji

I have solved this problem by changing MeanIoU with OneHotIoU

would you mind contributing this in KerasCV instead?

How can I do that?

Make a PR on https://github.com/keras-team/keras-cv on a MeanIoU that corrects this

Sep 29 '22 21:09 tanzhenyu

Hi,

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space.

Jul 26 '24 09:07 tilakrayal

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

Aug 03 '24 01:08 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

Aug 10 '24 01:08 github-actions[bot]

tensorflow tensorflow copied to clipboard

tf.keras.metrics.MeanIoU API is practically unusable without a threshold

tensorflow
tensorflow copied to clipboard