keras-cv
keras-cv copied to clipboard
Fix `ChannelShuffle`
Close https://github.com/keras-team/keras-cv/issues/259
Was this done for shufflenet
?
I didn't test with the actual ShuffleNet model but I tested with this model, works fine so far. So, I think it should also work in general.
What I meant is that shufflenet has not this randomizzation requirements so probably we need to do something different.
I don't think that we care about gradients in KLPs cause we have already many other ops without gradient in KLPs so I don't think we are tracking this.
What I meant is that shufflenet has not this randomizzation requirements so probably we need to do something different.
Like this?
tf.random.set_seed(self.seed)
tf.random.uniform(shape=[self.group], seed=self.seed)
Yes like:
https://github.com/tensorpack/tensorpack/blob/master/examples/ImageNetModels/shufflenet.py#L40-L49
Yes. I found that earlier but couldn't adopt it because initially this layer was proposed as an input channel shuffle and randomness was a must.
However, now I think we can use tf.random.set_seed
to make it static or a one-time change for each channel layer's instance.
Let's see if we could unify the two use cases.
But It isn't clear to me if we want to reuse the KLP namespace in differentiable layers...
I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.
Can we maintain the original impl for KLP?
Let's see if we could unify the two use cases.
But It isn't clear to me if we want to reuse the KLP namespace in differentiable layers...
I think it is fine. Layers are imported from keras_cv.layers.*
, where the code lives is less important. ShuffleNet is a rare case.
I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.
Can we maintain the original impl for KLP?
thanks for the note: can we clarify this point? How does the behavior change? This may be going over my head here
I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.
Can we maintain the original impl for KLP?
thanks for the note: can we clarify this point? How does the behavior change? This may be going over my head here
What we are trying to fix with this PR? The missing gradient? avoid vmap fallback? XLA coverage? Something else?
I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors. Can we maintain the original impl for KLP?
thanks for the note: can we clarify this point? How does the behavior change? This may be going over my head here
What we are trying to fix with this PR? The missing gradient? avoid vmap fallback? XLA coverage? Something else?
I imagine the missing gradient but that is a separate discussion imo.
What behavior mismatches here? Functionally the layer does the same thing, right? Or do I misunderstand
@bhack
I think we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.
The operation will have a subtle difference if we want to maintain two separate implementations and the naming perhaps.
Also, currently, it uses two times transpose operation. But I think the new PR is more light. What do you think?
What I meant is that shufflenet has not this randomizzation requirements so probably we need to do something different.
Like this?
tf.random.set_seed(self.seed) tf.random.uniform(shape=[self.group], seed=self.seed)
@LukeWood Is it ok to use the above way?
@bhack
I think we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.
The operation will have a subtle difference if we want to maintain two separate implementations and the naming perhaps.
Also, currently, it uses two times transpose operation. But I think the new PR is more light. What do you think?
Can we clarify what the differences are? I am not convinced we need 2 implementations unless the differences are significant
Can we clarify what the differences are? Two points:
- I've linked above a quite popular TF implementation at that time and If I remember correctly from the paper (but I need to check it again)
channel_shuffle
has no random concept in ShuffleNet. - Correct me if I am wrong. By an API point of view as our KLP layers are preprocessing layers they are not guaranteed do be differentiable. So it could be a little bit strange to use a layer from a KLP namespace in a "learnable" section of the network graph.
@bhack
Correct me if I am wrong. By an API point of view as our KLP layers are preprocessing layers they are not guaranteed do be differentiable. So it could be a little bit strange to use a layer from a KLP namespace in a "learnable" section of the network graph.
I think it's fine to consider it a special case. The static channel shuffle operation is offered in pytorch officially but not as a random layer for image augmentation. But it does on the albumentaiton library. Either it treats as a differentiable layer or processing layer, in both case it has to subclass the layer
module. HERE was a relevant query. Also, I think the KPL layers may contain differentiable layers in the future.
I think the KPL layers may contain differentiable layers in the future.
Isn't this point partially the root cause of this PR? If we are going in this direction we need to care every time about using or not differentiable ops when we contribute a new component in the preprocessing
folder.
Your detection emerged with a sort of "e2e integration test" https://github.com/keras-team/keras-cv/pull/218#discussion_r840739049):
- we currently don't have this integration tests in the repo
- it emerged as we had a minor bug in TF for that specific op fixed by https://github.com/tensorflow/tensorflow/pull/55529.
I think the KPL layers may contain differentiable layers in the future.
Isn't this point partially the root cause of this PR? If we are going in this direction we need to care every time about using or not differentiable ops when we contribute a new component in the
preprocessing
folder.Your detection emerged with a sort of "e2e integration test" #218 (comment)):
- we currently don't have this integration tests in the repo
- it emerged as we had a minor bug in TF for that specific op fixed by Register Not differentiable RandomShuffle tensorflow/tensorflow#55529.
From what I can tell I think ShuffleNet is a super rare one off.
From what I can tell I think ShuffleNet is a super rare one off.
Also If we consider this a quite rare case (I am not so future proof on this), for the API consistency we need to consider that this approach will require to remove the Random prefix from the name also if It has the (optional) random logic. For all the other KLPs we used Random prefix name when there was a random logic. Instead for this one, if we use the dual scope, we need to control its roandomicty by an extra param. It think we are looping a Little bit of coherence.
I think it's fine to consider it a special case. The static channel shuffle operation is offered in pytorch officially but not as a random layer for image augmentation
About pytorch there is a more general request of a random shuffle ops with axis param:
https://github.com/pytorch/pytorch/issues/71409
@bhack cc. @LukeWood
Also If we consider this a quite rare case (I am not so future proof on this), for the API consistency we need to consider that this approach will require removing the Random prefix from the name also if It has the (optional) random logic.
IMO, I think it's not required to add a random
prefix if we consider it as a special case. Curious, will keras_cv
always provide only the random transformation layer?
I'm ok to have two separate layers if it's accepted to keras_cv
. But that will also feel a bit odd and naming might be confusing.
I'm more interested to unify the two cases, i.e. for the random operation as KPL and for static operation as Model layers.
Curious, will keras_cv always provide only the random transformation layer?
I think this part of the topic was early covered at https://github.com/keras-team/keras-cv/pull/122#discussion_r803214729. IMHO from that discussion we have not got any particular outcome.
From what I can tell I think ShuffleNet is a super rare one off.
Also If we consider this a quite rare case (I am not so future proof on this), for the API consistency we need to consider that this approach will require to remove the Random prefix from the name also if It has the (optional) random logic. For all the other KLPs we used Random prefix name when there was a random logic. Instead for this one, if we use the dual scope, we need to control its roandomicty by an extra param. It think we are looping a Little bit of coherence.
I think it's fine to consider it a special case. The static channel shuffle operation is offered in pytorch officially but not as a random layer for image augmentation
About pytorch there is a more general request of a random shuffle ops with axis param:
Ok, yeah if there is no random behavior in one case and it is fully random in the other it should really be a different layer. I misunderstood the original paper and meaning of ShuffleNet.
Ok, yeah if there is no random behavior in one case and it is fully random in the other it should really be a different layer. I misunderstood the original paper and meaning of ShuffleNet.
It was the original point https://github.com/keras-team/keras-cv/issues/259#issuecomment-1116568351 13 days ago :wink:
Closing for now, as this does not seem to do the same thing as RandomChannelShuffle(). Instead; this appears to shuffle channels uniformly in order to support ShuffleNet. If we ever support ShuffleNet, we can introduce a one-off layer to support that model architecture, in a gradient friendly approach.
Thank you for your enthusiasm as always!