PCGrad shuffle stacked loss

Consider replacing

tf.random.shuffle(loss)

with

loss = tf.gather(loss, tf.random.shuffle(tf.range(tf.shape(loss)[0])))

Jun 11 '20 21:06 cfifty

Hi @cfifty , May I ask why not replacing with loss=tf.random.shuffle(loss)?

Aug 16 '20 07:08 luzai

In non-eager mode, tf.random.shuffle(loss) is never called, so the loss list is not shuffled if you use graph mode TensorFlow.
If you use loss=tf.random.shuffle(loss), the backwards pass of tf.random.shuffle is not defined. Thus, you can't compute gradients through this operation and an error is thrown. See https://stackoverflow.com/questions/55701407/how-to-shuffle-tensor-in-tensorflow-errorno-gradient-defined-for-operation-ra for additional context.

Aug 16 '20 07:08 cfifty

Thank you very much for your detailed explanation!

The loss list is not shuffled if using tf.random.shuffle(loss). For the reason, I think, tf.random.shuffle is not an inplace operation, and thus the input argument loss is not shuffled.
It seems loss=tf.random.shuffle(loss) do not throw an error with tf 1.15.3. Maybe in the new version, the gradient for this operation is registered. Overall, I think loss = tf.gather(loss, tf.random.shuffle(tf.range(tf.shape(loss)[0]))) is a greater choice for compatibility,

Aug 16 '20 10:08 luzai

I am sorry that I made a mistake, the gradient operation is still not defined for loss=tf.random.shuffle(loss) in tf 1.15.3.

We should consider use loss = tf.gather(loss, tf.random.shuffle(tf.range(tf.shape(loss)[0]))) instead.

Aug 19 '20 08:08 luzai

PCGrad PCGrad copied to clipboard