keras icon indicating copy to clipboard operation
keras copied to clipboard

Tensorflow/Keras 2.10 initializer randomness changes : How to get 100% reproducible results?

Open Zahlii opened this issue 3 years ago • 5 comments
trafficstars

Starting with tensorflow 2.10, the behavior of weight initialization has changed. Previously, you could get perfectly reproducibly results by simply setting the numpy / tensorflow global seeds, and the seeds for the data loader, without needing to dwell into the model code itself.

However, with 2.10, it is now required to pass a specific initializer with a seed attached , e.g. keras.layers.Dense(1, kernel_initializer=keras.initializers.GlorotUniformV2(seed=1). Obviously, this is a much cleaner choice from an API point of view, however the following problems occur when you want to achieve perfectly reproducable results.

1.) For many architectures, we can NOT modify the code (i.e. using for some built-in applications), and hence can not set initializer weights. Arguably, this is usually not the main use case as you'd load pre-trained weights, but even the classifier added on top wouldn't be fixed. 2.) When passing no initializer, the default type is derived based on the dtype / shape automatically. This means if I want to modify my existing architecture to include seeded initializers, I would first (for each layer!) check the default initializer classes, and then assign seeds to all involved initializers (which obviously is going to be a major effort) 3.) We are also not able to solve this by backtracking the keras graph, searching for initializer attributes and setting the seed / random generator attribute (see also my attempt below), as at the time where we may want to construct the Model(), the initializer was already called and weights created. This means we can only re-init the layer, as shown below. 4.) Even if we managed to automatize this, my understanding is that given an initializer (class), a fixed shape and a fixed seed, the values would always be the same. This means if by any chance we would have two times the same variable shape, setting all of them to the same seed would produce the same initializers, even if we don't necessarily want this. In this way, what we probably would need to do is to "hack" around this, and set the seed with something as hash(layer.name + model_seed)

Is there any more straight-forward way of achieving this? This may be better suited under feature request, with the feature being a concise way of ensuring model reproducability.

def set_initializer_seeds(inputs, outputs, seeds) -> "Model":
"""
Makes sure that every variable initializer has a certain seed attached for reproducibility

:param inputs:
:param outputs:
:param seeds:
:return:
"""
# pylint: disable=protected-access
from keras.models import Model
from keras.layers import Layer
from keras import backend as K

visited = set()

def _backtrack(cn):
    idx = id(cn)
    if idx in visited:
        return
    visited.add(idx)
    layer: "Layer" = cn.node.layer
    re_build = False
    for attr in dir(layer):
        if "initializer" in attr and not attr.startswith("_") and not attr.endswith("_"):
            init = getattr(layer, attr, None)
            if init is not None and hasattr(init, "seed") and init.seed is None:
                hashed_seed = xxhash.xxh32(layer.name + attr + str(seeds)).intdigest()
                init.seed = hashed_seed
                if hasattr(init, "_random_generator"):
                    init._random_generator = K.RandomGenerator(
                        hashed_seed, rng_type="stateless"
                    )
                    re_build = True
    if re_build:
        layer._trainable_weights = []
        layer._non_trainable_weights = []
        layer.build(layer.input_shape)

    for inbound in cn.node.keras_inputs:
        _backtrack(inbound)

for o in outputs:
    _backtrack(o)

return Model(inputs, outputs)

Zahlii avatar Sep 12 '22 14:09 Zahlii

@Zahlii, There is a behavior change for tf.keras.initializers in tensorflow v2.10. Keras initializers will now use stateless random ops to generate random numbers.

Both seeded and unseeded initializers will always generate the same values every time they are called (for a given variable shape). For unseeded initializers (seed=None), a random seed will be created and assigned at initializer creation (different initializer instances get different seeds).

An unseeded initializer will raise a warning if it is reused (called) multiple times. This is because it would produce the same values each time, which may not be intended. Thank you!

tilakrayal avatar Sep 13 '22 08:09 tilakrayal

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Sep 20 '22 09:09 google-ml-butler[bot]

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] avatar Sep 27 '22 09:09 google-ml-butler[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Sep 27 '22 09:09 google-ml-butler[bot]

@tilakrayal I'd like to re-open this, as the label suggested to me that we are still waiting for a reply from a tensorflow contributor.

I am perfectly aware of the introduced API changes you described, but as mentioned above they lead to a huge reproducibility problem which can NOT easily be fixed, as we do not control the code generating the initializers for most pre-trained / hub models!

Zahlii avatar Sep 27 '22 09:09 Zahlii