models icon indicating copy to clipboard operation
models copied to clipboard

SNGP for additional layers: Con1D, GRU or LSTM

Open puentene opened this issue 3 years ago • 7 comments

1. The URL of interest:

https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/spectral_normalization.py

2. Describe the feature you request:

We are interested in trying to apply the SNGP process, to see if it is able to distinguish out-of-domain data, to a problem (and hence model) we are working on.

3. Additional context:

The model we are applying consists of three Conv1D layers followed by two GRU layers and a Dense layer and I followed the instructions from https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/understanding/sngp.ipynb#scrollTo=MOS9qFlW2o3J, however, the spectral normalization wrapper is only available for the Dense and Conv2D layer. Is it possible to apply the spectral normalization of the dense layer to the layers I am using or would it need to be modified?

4. Are you willing to contribute it?

Yes, within my abilities, as I am not from the ML field.

puentene avatar Dec 06 '21 09:12 puentene

@jereliu adding the author for concrete suggestions

saberkun avatar Dec 14 '21 06:12 saberkun

Hi! I believe the spectral normalization wrapper needs to be modified to work properly with Conv1D layer. It should not be too difficult to do (e.g, by modifying the implementation of the Conv2D wrapper so it works with the Conv1D layer).

jereliu avatar Dec 14 '21 07:12 jereliu

So, the basic calculation within the layer would stay the same? I guess this is not so easy for the GRU layer? Thank you for the input, already!

puentene avatar Dec 14 '21 07:12 puentene

Hi! I believe the basic calculation should be the same (although I haven’t implemented it myself). To make it work for GRU layer, you probably need to subclass the official GRU layer, and experiment with wrapping its dense layers with the spectral normalization wrapper. If you want to be careful, can consider start with only wrapping one the main layers with GRU.

On Mon, Dec 13, 2021 at 11:48 PM puentene @.***> wrote:

So, the basic calculation within the layer would stay the same? I guess this is not so easy for the GRU layer? Thank you for the input, already!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/models/issues/10416#issuecomment-993247903, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATMGLW76WEU62MIG44PUE3UQ3ZFVANCNFSM5JOENIAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jereliu avatar Dec 14 '21 07:12 jereliu

Hi! I believe the spectral normalization wrapper needs to be modified to work properly with Conv1D layer. It should not be too difficult to do (e.g, by modifying the implementation of the Conv2D wrapper so it works with the Conv1D layer).

`# So you would suggest something like this: class SpectralNormalizationConv1D(tf.keras.layers.Wrapper): """Implements spectral normalization for Conv2D layer based on [3]."""

def init(self, layer, iteration=1, norm_multiplier=0.95, training=True, aggregation=tf.VariableAggregation.MEAN, legacy_mode=False, **kwargs): """Initializer. Args: layer: (tf.keras.layers.Layer) A TF Keras layer to apply normalization to. iteration: (int) The number of power iteration to perform to estimate weight matrix's singular value. norm_multiplier: (float) Multiplicative constant to threshold the normalization. Usually under normalization, the singular value will converge to this value. training: (bool) Whether to perform power iteration to update the singular value estimate. aggregation: (tf.VariableAggregation) Indicates how a distributed variable will be aggregated. Accepted values are constants defined in the class tf.VariableAggregation. legacy_mode: (bool) Whether to use the legacy implementation where the dimension of the u and v vectors are set to the batch size. It should not be enabled unless for backward compatibility reasons. **kwargs: (dict) Other keyword arguments for the layers.Wrapper class. """ self.iteration = iteration self.do_power_iteration = training self.aggregation = aggregation self.norm_multiplier = norm_multiplier self.legacy_mode = legacy_mode

# Set layer attributes.
layer._name += '_spec_norm'

if not isinstance(layer, tf.keras.layers.Conv1D):
  raise ValueError(
      'layer must be a `tf.keras.layer.Conv1D` instance. You passed: {input}'
      .format(input=layer))
super(SpectralNormalizationConv1D, self).__init__(layer, **kwargs)

def build(self, input_shape): self.layer.build(input_shape) self.layer.kernel._aggregation = self.aggregation # pylint: disable=protected-access self._dtype = self.layer.kernel.dtype

# Shape (kernel_size_1, kernel_size_2, in_channel, out_channel).
self.w = self.layer.kernel
self.w_shape = self.w.shape.as_list()
self.strides = self.layer.strides

# Set the dimensions of u and v vectors.
batch_size = input_shape[0]
uv_dim = batch_size if self.legacy_mode else 1

# Resolve shapes.
in_width = input_shape[1] ##
in_channel = self.w_shape[1]

out_width = in_width // self.strides ##
out_channel = self.w_shape[2] ##

self.in_shape = (uv_dim, in_width, in_channel)
self.out_shape = (uv_dim, out_width, out_channel)
self.uv_initializer = tf.initializers.random_normal()

self.v = self.add_weight(
    shape=self.in_shape,
    initializer=self.uv_initializer,
    trainable=False,
    name='v',
    dtype=self.dtype,
    aggregation=self.aggregation)

self.u = self.add_weight(
    shape=self.out_shape,
    initializer=self.uv_initializer,
    trainable=False,
    name='u',
    dtype=self.dtype,
    aggregation=self.aggregation)

super(SpectralNormalizationConv1D, self).build()

def call(self, inputs): u_update_op, v_update_op, w_update_op = self.update_weights() output = self.layer(inputs) w_restore_op = self.restore_weights()

# Register update ops.
self.add_update(u_update_op)
self.add_update(v_update_op)
self.add_update(w_update_op)
self.add_update(w_restore_op)

return output

def update_weights(self): """Computes power iteration for convolutional filters based on [3] i.e. Conv2D version.""" # Initialize u, v vectors. u_hat = self.u v_hat = self.v

if self.do_power_iteration:
  for _ in range(self.iteration):
    # Updates v.
    v_ = tf.nn.conv1d_transpose(
        u_hat,
        self.w,
        output_shape=self.in_shape,
        strides=self.strides,
        padding='SAME')
    v_hat = tf.nn.l2_normalize(tf.reshape(v_, [1, -1]))
    v_hat = tf.reshape(v_hat, v_.shape)

    # Updates u.
    u_ = tf.nn.conv1d(v_hat, self.w, strides=self.strides, padding='SAME')
    u_hat = tf.nn.l2_normalize(tf.reshape(u_, [1, -1]))
    u_hat = tf.reshape(u_hat, u_.shape)

v_w_hat = tf.nn.conv1d(v_hat, self.w, strides=self.strides, padding='SAME')

sigma = tf.matmul(tf.reshape(v_w_hat, [1, -1]), tf.reshape(u_hat, [-1, 1]))
# Convert sigma from a 1x1 matrix to a scalar.
sigma = tf.reshape(sigma, [])

u_update_op = self.u.assign(u_hat)
v_update_op = self.v.assign(v_hat)

w_norm = tf.cond((self.norm_multiplier / sigma) < 1, lambda:      # pylint:disable=g-long-lambda
                 (self.norm_multiplier / sigma) * self.w, lambda: self.w)

w_update_op = self.layer.kernel.assign(w_norm)

return u_update_op, v_update_op, w_update_op

def restore_weights(self): """Restores layer weights to maintain gradient update (See Alg 1 of [1]).""" return self.layer.kernel.assign(self.w)`

puentene avatar Dec 21 '21 12:12 puentene

Hi! I believe the basic calculation should be the same (although I haven’t implemented it myself). To make it work for GRU layer, you probably need to subclass the official GRU layer, and experiment with wrapping its dense layers with the spectral normalization wrapper. If you want to be careful, can consider start with only wrapping one the main layers with GRU. On Mon, Dec 13, 2021 at 11:48 PM puentene @.***> wrote: So, the basic calculation within the layer would stay the same? I guess this is not so easy for the GRU layer? Thank you for the input, already! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10416 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATMGLW76WEU62MIG44PUE3UQ3ZFVANCNFSM5JOENIAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Hi again! Sorry, I am not quite sure if I understand correctly: Would you apply the normalization (as for the dense layer) to the weights of the hidden layer at each step i.e. in https://github.com/keras-team/keras/blob/v2.7.0/keras/layers/recurrent_v2.py#L204-L525 h = z * h_tm1 + (1 - z) * hh in line 595 or to the last_output of the backend.rnn? Or did I completeley misunderstand and you mean an actual Dense layer in the GRU? Thank you for any help in advance!

puentene avatar Dec 21 '21 12:12 puentene

Hi!

In general, I think the spectral normalization are mostly intended to be applied to the hidden layers (not the input layer or the output layer).

In the context of the standard_gru function, I think the spectral normalization should be applied to the recurrent_kernel updates (https://github.com/keras-team/keras/blob/2c48a3b38b6b6139be2da501982fd2f61d7d48fe/keras/layers/recurrent_v2.py#L585), where the dot product + bias add operation on line L585-586 is essentially a dense layer. So you want to consider replace this with a spectral-normalized residual block.

Thanks,

jereliu avatar Dec 21 '21 17:12 jereliu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Mar 01 '23 22:03 google-ml-butler[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Mar 08 '23 22:03 google-ml-butler[bot]

Closing as stale. Please reopen if you'd like to work on this further. Thanks

laxmareddyp avatar Mar 13 '23 17:03 laxmareddyp