tf-keras Internal seed of keras.layers.Dropout gets increased by +1 every time a tf.function gets created from the model

System information.

Have I written custom code (as opposed to using a stock example script provided in Keras): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux CentOS
TensorFlow installed from (source or binary): PYPI
TensorFlow version (use command below): 2.11.0
Python version: 3.7
Bazel version (if compiling from source):
GPU model and memory:
Exact command to reproduce:

Describe the problem.

When using keras.layers.Dropout with a user defined seed, the seed gets increments by +1 every time tf.function gets called (i.e. in make_train_function, ...).

See code and log below. As you can see, every time I run tf.function, the seed of the Keras dropout layer gets increased. BUT the seed of manually crafted Dropout using tf.random.uniform does not. So I assume it's a Keras problem. I tried to pinpoint the error and found the following code:

https://github.com/keras-team/keras/blob/d2a6b9e0efdd01dd1bc6cf0bb52336d0009aba6c/keras/backend.py#L2007-L2025

Here the seed gets increased "When user didn't provide any original seed". However, the check is wrong, as self._seed is the user defined seed. It get set in the constructor and also here: https://github.com/keras-team/keras/blob/d2a6b9e0efdd01dd1bc6cf0bb52336d0009aba6c/keras/backend.py#L1974-L1976 https://github.com/keras-team/keras/blob/d2a6b9e0efdd01dd1bc6cf0bb52336d0009aba6c/keras/backend.py#L2027-L2033

So I think the make_legacy_seed function needs to be fixed. I'm not sure about this comment "... it is important to generate different seed for stateful ops ...", because the underlying tf.random.uniform stores the state in the PhiloxRandom C++ class, which gets incremented throughout the training process and only gets reset when triggered by tf.random.set_seed(...). I don't see why the seed needs to be manually increased here.

Describe the current behavior. Every time we run tf.function the seed of the keras.layers.Dropout gets increased.

Describe the expected behavior. The seeds should be stable, so we can have deterministic behavior.

Contributing.

Do you want to contribute a PR? (yes/no): yes
If yes, please read this page for instructions
Briefly describe your candidate solution(if contributing):

Personally I would remove the make_legacy_seed method, but I'm not sure if there was a special purpose for this behavior. Alternative, we could store a self.user_provided_seed = seed is not None and check for this within make_legacy_seed.

Standalone code to reproduce the issue.

import tensorflow as tf
import numpy as np

tf.random.set_seed(123)

rate = 0.5
inp = tf.keras.Input((5,))
x = tf.keras.layers.Dropout(rate=rate, seed=345)(inp)
y = tf.cast(tf.random.uniform(shape=tf.shape(inp), dtype=inp.dtype, seed=345) >= rate, inp.dtype) * inp * (1/rate)
model = tf.keras.Model(inputs=[inp], outputs=[x, y])

args = np.ones((1, 5,), np.float32)
counter = 0

def print_seeds():
    global counter
    print(f"Run #{counter}")
    counter += 1

    print("\tModel:")
    for l in model.layers:
        if hasattr(l, 'seed'):
            print("\t\t", l.name, l.seed)

    print("\tGraph:")
    func = tf.function(model.__call__)
    graph = func.get_concrete_function(args, training=True).graph
    for node in graph.get_operations():
        if node.op_def.name == "RandomUniform":
            print("\t\t", node.name, 'Seed:', node.get_attr('seed'), 'Seed2:', node.get_attr('seed2'))
    print()

print_seeds()
print_seeds()
print_seeds()

Source code / logs.

Run #0
        Model:
                 dropout 345
        Graph:
                 model/tf.random.uniform/random_uniform/RandomUniform Seed: 123 Seed2: 345
                 model/dropout/dropout/random_uniform/RandomUniform Seed: 123 Seed2: 345

Run keras-team/keras#1
        Model:
                 dropout 345
        Graph:
                 model/tf.random.uniform/random_uniform/RandomUniform Seed: 123 Seed2: 345
                 model/dropout/dropout/random_uniform/RandomUniform Seed: 123 Seed2: 346

Run keras-team/keras#2
        Model:
                 dropout 345
        Graph:
                 model/tf.random.uniform/random_uniform/RandomUniform Seed: 123 Seed2: 345
                 model/dropout/dropout/random_uniform/RandomUniform Seed: 123 Seed2: 347

Feb 10 '23 10:02 mergian

@SuryanarayanaY I was able to replicate the issue on colab, please find the gist here. Thank you!

Feb 13 '23 07:02 sushreebarsa

@qlzh727 could you take a look at this one? I'm not actually sure what our desired expected behavior is (or what legacy constraints we are dealing with).

Mar 16 '23 01:03 mattdangerw

I looked more into the details of this bug.

The reasoning for this kind of implementation is, that if you have a layer that internally uses multiple Dropouts (e.g., RNN), then the user only needs to provide a single seed value, and this incrementing ensures that all sublayers get a unique seed. Otherwise all sublayers would generate the identical dropouts.

The problem of the current implementation is, that this is a global state, that get accumulated over time and cannot be reset.

I prepared a pull request, will submit it in a minute. There I changed the behavior of the RandomGenerator. Instead of having a global state, that cannot be reset anymore, I created a scope object, which carries this accumulative state. It is only alive within the def call(...) of the layer.

I know it's a big change that touches several modules, but in my opinion this is a fundamental problem, that cannot be solved in another way.

Mar 16 '23 07:03 mergian

@mergian , Thanks for the PR. Our developer team will review and let you know if anything more required. Thanks.

Mar 16 '23 16:03 SuryanarayanaY

I think the legacy seed behavior here is expected, eg if you dropout (regardless if it is seed or not seeded) multiple times, it will produce different dropout mask. The seeded version will give you a deterministic sequence of dropout masks.

Since the seed is just an internal state and not a part of public API, apart of that, what do u expect from the dropout layer? eg the current behavior is:

import tensorflow as tf
import numpy as np
tf.keras.utils.set_random_seed(123)

inputs = np.random.random([4, 5, 6])
# print(inputs)
dropout = tf.keras.layers.Dropout(rate=0.5, seed=123)
output_1 = dropout(inputs, training=True)
output_2 = dropout(inputs, training=True)
# output_2 != output_1

tf.keras.utils.set_random_seed(123)
dropout_2 = tf.keras.layers.Dropout(rate=0.5, seed=123)
output_3 = dropout_2(inputs, training=True)
output_4 = dropout_2(inputs, training=True)

# output_3 != output_4
# output_1 == output_3
# output_2 == output_4

Mar 20 '23 16:03 qlzh727

@mergian replied his thought in https://github.com/keras-team/keras/pull/17685#issuecomment-1476756246.

Mar 20 '23 20:03 qlzh727

Let me check on our end and see what's the exact reason we increase the seed for the legacy stateful ops code path.

Mar 20 '23 20:03 qlzh727

I have two ideas why this had been implemented.

Maybe in old days of TensorFlow the tf.random.uniform didn't store it's local state. In that case it would have always produced the same results if Keras had not increased this value whenever called. But I don't know if that has ever been the case.
In layers that use multiple Dropouts (e.g., LSTM), this mechanism allows to provide a single user seed and still produce different random numbers for all dropouts. This is also the behavior that my pull request preserves.

Simple Example:

class MultiDropout:
	def __init__(self, rate, seed, cnt):
		self.rate = rate
		self.seed = seed
		self.cnt = cnt
		
	def call(self, x):
		for _ in range(self.cnt):
			x = keras.layers.Dropout(self.rate, self.seed)(x)
		return x

In this case, all dropouts would generate the identical output, which is not intended. Keras at the moment is doing is this:

def call(self, x):
	for _ in range(self.cnt):
		x = keras.layers.Dropout(self.rate, self.seed)(x)
		self.seed += 1
	return x

And my proposal is this:

def call(self, x):
	seed = self.seed
	for _ in range(self.cnt):
		x = keras.layers.Dropout(self.rate, seed)(x)
		seed += 1
	return x

Mar 21 '23 06:03 mergian

Sorry for the very late reply. I vaguely remember that this was using to mitigate a behavior discrepency between eager and graph for seeded stateful ops. see following example:

import tensorflow as tf

@tf.function
def func(seed=None):
    r1 = tf.random.uniform(shape=(2, 2), seed=seed)
    r2 = tf.random.uniform(shape=(2, 2), seed=seed)
    return r1, r2

print(func())
print(func(123))


def eager_func(seed=None):
    r1 = tf.random.uniform(shape=(2, 2), seed=seed)
    r2 = tf.random.uniform(shape=(2, 2), seed=seed)
    return r1, r2

print(eager_func())
print(eager_func(123))

Note that the r1 and r2 from func() with seed 123 will produce exact same value, which cause issue for initializer and dropout.

let me also add @wangpengmit who is the expert for RNG on TF side.

Mar 26 '23 17:03 qlzh727

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

May 04 '23 01:05 github-actions[bot]

Ping

May 04 '23 05:05 mergian

I don't have a strong opinion on this. Increasing a user-provided seed does surprise me a bit. OTOH the problem in https://github.com/keras-team/tf-keras/issues/60 does exist. Letting two legacy stateful RNG ops use the same seed is treacherous: they'll produce the same output in tf.function but different outputs outside it.

It's up to the Keras team to decide Dropout's behavior on legacy stateful RNGs.

May 04 '23 21:05 wangpengmit