tf-keras
tf-keras copied to clipboard
Tensorflow/Keras 2.10 initializer randomness changes : How to get 100% reproducible results?
Starting with tensorflow 2.10, the behavior of weight initialization has changed. Previously, you could get perfectly reproducibly results by simply setting the numpy / tensorflow global seeds, and the seeds for the data loader, without needing to dwell into the model code itself.
However, with 2.10, it is now required to pass a specific initializer with a seed attached , e.g. keras.layers.Dense(1, kernel_initializer=keras.initializers.GlorotUniformV2(seed=1). Obviously, this is a much cleaner choice from an API point of view, however the following problems occur when you want to achieve perfectly reproducable results.
1.) For many architectures, we can NOT modify the code (i.e. using for some built-in applications), and hence can not set initializer weights. Arguably, this is usually not the main use case as you'd load pre-trained weights, but even the classifier added on top wouldn't be fixed.
2.) When passing no initializer, the default type is derived based on the dtype / shape automatically. This means if I want to modify my existing architecture to include seeded initializers, I would first (for each layer!) check the default initializer classes, and then assign seeds to all involved initializers (which obviously is going to be a major effort)
3.) We are also not able to solve this by backtracking the keras graph, searching for initializer attributes and setting the seed / random generator attribute (see also my attempt below), as at the time where we may want to construct the Model(), the initializer was already called and weights created. This means we can only re-init the layer, as shown below.
4.) Even if we managed to automatize this, my understanding is that given an initializer (class), a fixed shape and a fixed seed, the values would always be the same. This means if by any chance we would have two times the same variable shape, setting all of them to the same seed would produce the same initializers, even if we don't necessarily want this. In this way, what we probably would need to do is to "hack" around this, and set the seed with something as hash(layer.name + model_seed)
Is there any more straight-forward way of achieving this? This may be better suited under feature request, with the feature being a concise way of ensuring model reproducability.
def set_initializer_seeds(inputs, outputs, seeds) -> "Model":
"""
Makes sure that every variable initializer has a certain seed attached for reproducibility
:param inputs:
:param outputs:
:param seeds:
:return:
"""
# pylint: disable=protected-access
from keras.models import Model
from keras.layers import Layer
from keras import backend as K
visited = set()
def _backtrack(cn):
idx = id(cn)
if idx in visited:
return
visited.add(idx)
layer: "Layer" = cn.node.layer
re_build = False
for attr in dir(layer):
if "initializer" in attr and not attr.startswith("_") and not attr.endswith("_"):
init = getattr(layer, attr, None)
if init is not None and hasattr(init, "seed") and init.seed is None:
hashed_seed = xxhash.xxh32(layer.name + attr + str(seeds)).intdigest()
init.seed = hashed_seed
if hasattr(init, "_random_generator"):
init._random_generator = K.RandomGenerator(
hashed_seed, rng_type="stateless"
)
re_build = True
if re_build:
layer._trainable_weights = []
layer._non_trainable_weights = []
layer.build(layer.input_shape)
for inbound in cn.node.keras_inputs:
_backtrack(inbound)
for o in outputs:
_backtrack(o)
return Model(inputs, outputs)
@Zahlii,
There is a behavior change for tf.keras.initializers in tensorflow v2.10. Keras initializers will now use stateless random ops to generate random numbers.
Both seeded and unseeded initializers will always generate the same values every time they are called (for a given variable shape). For unseeded initializers (seed=None), a random seed will be created and assigned at initializer creation (different initializer instances get different seeds).
An unseeded initializer will raise a warning if it is reused (called) multiple times. This is because it would produce the same values each time, which may not be intended. Thank you!
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.
@tilakrayal I'd like to re-open this, as the label suggested to me that we are still waiting for a reply from a tensorflow contributor.
I am perfectly aware of the introduced API changes you described, but as mentioned above they lead to a huge reproducibility problem which can NOT easily be fixed, as we do not control the code generating the initializers for most pre-trained / hub models!
Any news on this?
Considering the following examples:
import tensorflow as tf
tfk = tf.keras
tfkl = tf.keras.layers
def test1():
tf.random.set_seed(40)
init = tfk.initializers.HeNormal(seed=20)
d1 = tfkl.Dense(5, kernel_initializer=init)
d2 = tfkl.Dense(5, kernel_initializer=init)
x = tf.random.normal((1,1))
d1(x)
d2(x)
return d1, d2
def test2():
tf.random.set_seed(40)
init = tfk.initializers.HeNormal()
d1 = tfkl.Dense(5, kernel_initializer=init)
d2 = tfkl.Dense(5, kernel_initializer=init)
x = tf.random.normal((1,1))
d1(x)
d2(x)
return d1, d2
def test3():
tf.random.set_seed(40)
init = tf.random_normal_initializer(stddev=0.02)
d1 = tfkl.Dense(5, kernel_initializer=init)
d2 = tfkl.Dense(5, kernel_initializer=init)
x = tf.random.normal((1,1))
d1(x)
d2(x)
return d1, d2
d1_1, d1_2 = test1()
d1_3, d1_4 = test1()
d2_1, d2_2 = test2()
d2_3, d2_4 = test2()
d3_1, d3_2 = test3()
d3_3, d3_4 = test3()
What I get is:
d1_1 = d1_2 = d1_3 = d1_4 (❌ since d1_1 = d1_2 may not be intended) d2_1 = d2_2 - d2_1 != d2_3 - d2_2 != d2_4 - d2_3 = d2_4 (❌ since d1_1 != d2_1) d3_1 != d3_2 - d3_1 = d3_3 - d3_2 = d3_4 ✔️
I think it would be useful to have a way to use tf.keras.initializers as in test3(), where you can set the global seed, choose an initializer class, and then get the same identical model each time the function is called, but without having the same initial weights for all layers.
Keras initializers to use stateless random operations, ensuring that both seeded and unseeded initializers produce the same values upon each call for a given shape, with unseeded initializers automatically getting a unique seed at creation and raising a warning if reused due to their now deterministic output.
Also tried to execute the mentioned code on latest Keras3.0 and observed the code was executed without fail/error. Kindly find the gist of it here. Thank you!