keras icon indicating copy to clipboard operation
keras copied to clipboard

BinaryCrossentropy should support a way to reduce over NO axes

Open LukeWood opened this issue 3 years ago • 7 comments

Currently, we do not support a way to reduce over no axes inbackend.mean.

In https://github.com/keras-team/keras/blob/c9068087d9142bab573e0c300bf9874a957accff/keras/losses.py#L2162 this means we cannot use the Keras BinaryCrossentropy loss for YOLOx model.

@quantumalvya needs a way to avoid reducing over any axis, lets support this in the backend.mean() function: https://github.com/keras-team/keras/blob/c9068087d9142bab573e0c300bf9874a957accff/keras/backend.py#L2899

LukeWood avatar Nov 01 '22 19:11 LukeWood

Is the underline it uses TF reduce_mean: https://www.tensorflow.org/api_docs/python/tf/math/reduce_mean

Where the axis=None semantic means "all the axis".

So what do you mean with avoid reducing over any axis?

bhack avatar Nov 03 '22 13:11 bhack

@LukeWood can you expand on the requirement here?

Perhaps some example code demonstrating this need would be useful. I'm curious why we would need to use binary_crossentropy as a loss without any reduction. Can the use case be solved using keras.backend.binary_crossentropy, which doesn't perform reduction?

ianstenbit avatar Nov 03 '22 14:11 ianstenbit

Sarvagya can you comment here? (pinged him a link as I can't tag him yet)

LukeWood avatar Nov 03 '22 18:11 LukeWood

Where the axis=None semantic means "all the axis".

So what do you mean with avoid reducing over any axis?

As in the loss should return without reducing the mean over any axis at all. axis=None means over all axes but we require no reduction for YoloX.

Perhaps some example code demonstrating this need would be useful.

Currently implementing this in the YoloX implementation as such:

        if self.axis is not None:
            return tf.reduce_mean(
                tf.keras.backend.binary_crossentropy(
                    y_true, y_pred, from_logits=self.from_logits
                ),
                axis=self.axis,
            )
        return tf.keras.backend.binary_crossentropy(
            y_true, y_pred, from_logits=self.from_logits
        )

I'm curious why we would need to use binary_crossentropy as a loss without any reduction. Can the use case be solved using keras.backend.binary_crossentropy, which doesn't perform reduction?

YoloX sums over the axis internally and then means based on the number of positive number boxes (as calculated by SimOTA). If we mean over the number of examples, it would be inaccurate. keras.backend.binary_crossentropy can achieve it, yes. However, we want to offer support for losses passed through compile(). I have currently made an internal loss as part of my implementation.

quantumalaviya avatar Nov 03 '22 19:11 quantumalaviya

Sarvagya can you comment here? (pinged him a link as I can't tag him yet)

It was just misspelled.

YoloX sums over the axis internally and then means based on the number of positive number boxes (as calculated by SimOTA). If we mean over the number of examples, it would be inaccurate. keras.backend.binary_crossentropy can achieve it, yes. However, we want to offer support for losses passed through compile(). I have currently made an internal loss as part of my implementation.

I think that it make more sense to create another BinaryCrossentropy like BinaryCrossentropyFocal or add an extra parameter to make the mean optional in BinaryCrossentropy instead of modifying mean.

YoloX sums over the axis internally

Does this mean that you could reduce sum and then divide by positive_boxes?

bhack avatar Nov 03 '22 20:11 bhack

I think that it make more sense to create another BinaryCrossentropy like BinaryCrossentropyFocal or add an extra parameter to make the mean optional in BinaryCrossentropy instead of modifying mean.

Does this mean that you could reduce sum and then divide by positive_boxes?

Yes, as for now, I have implemented an internal loss specifically for YoloX which does reduce_sum followed by the number of positive boxes.

quantumalaviya avatar Nov 03 '22 21:11 quantumalaviya

Yes also in the official reference impl the reduction is more in the BinaryCrossentropy API more then in mean:

https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/models/yolo_head.py#L493-L495

bhack avatar Nov 03 '22 21:11 bhack