Conditional MaskedAutoregressive Flow outputs NAN
Hi, I am experimenting with the bijector (MaskedAutoregressive Flow). Essentially I want it to be a mapping from a length 20 vector drawn from some base distribution to some other distribution, conditioned on some input value (which is a length 10 vector). Below is my code.
import tensorflow_probability as tfp
import tensorflow as tf
import numpy as np
tfd = tfp.distributions
tfb = tfp.bijectors
input = np.ones(shape=(20)).astype(np.float32)
condition = np.random.normal(size=(10)).astype(np.float32) + 10
fn = tfb.AutoregressiveNetwork(params=2, event_shape=20, conditional=True, conditional_event_shape=10, hidden_units=[10, 10])
bijector = tfb.MaskedAutoregressiveFlow(fn)
print(bijector.forward(input, conditional_input=condition))
Result of running the code gives me:
tf.Tensor( [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan], shape=(20,), dtype=float32)
I don't think my input or condition vectors are big enough to cause exploding values. Any ideas on what I might be doing wrong? Thanks!
For reference, I am using tensorflow-probability = 0.11.0
I looked into the code segment, and apparently the issue is caused by the underlying autoregressive network: it doesn't allow setting activation functions for conditional inputs. In particular I'm talking about the "conditional_output" below.
def build(self, input_shape):
"""See tfkl.Layer.build."""
if self._event_shape is None:
# `event_shape` wasn't specied at __init__, so infer from `input_shape`.
self._event_shape = [tf.compat.dimension_value(input_shape[-1])]
self._event_size = self._event_shape[-1]
self._event_ndims = len(self._event_shape)
# Should we throw if input_shape has rank > 2?
if input_shape[-1] != self._event_shape[-1]:
raise ValueError('Invalid final dimension of `input_shape`. '
'Expected `{!r}`, but got `{!r}`'.format(
self._event_shape[-1], input_shape[-1]))
# Construct the masks.
self._input_order = _create_input_order(
self._event_size,
self._input_order_param,
)
self._masks = _make_dense_autoregressive_masks(
params=self._params,
event_size=self._event_size,
hidden_units=self._hidden_units,
input_order=self._input_order,
hidden_degrees=self._hidden_degrees,
)
outputs = [tf.keras.Input((self._event_size,), dtype=self.dtype)]
inputs = outputs[0]
if self._conditional:
conditional_input = tf.keras.Input((self._conditional_size,),
dtype=self.dtype)
inputs = [inputs, conditional_input]
# Input-to-hidden, hidden-to-hidden, and hidden-to-output layers:
# [..., self._event_size] -> [..., self._hidden_units[0]].
# [..., self._hidden_units[k-1]] -> [..., self._hidden_units[k]].
# [..., self._hidden_units[-1]] -> [..., event_size * self._params].
layer_output_sizes = self._hidden_units + [self._event_size * self._params]
for k in range(len(self._masks)):
autoregressive_output = tf.keras.layers.Dense(
layer_output_sizes[k],
activation=None,
use_bias=self._use_bias,
kernel_initializer=_make_masked_initializer(
self._masks[k], self._kernel_initializer),
bias_initializer=self._bias_initializer,
kernel_regularizer=self._kernel_regularizer,
bias_regularizer=self._bias_regularizer,
kernel_constraint=_make_masked_constraint(
self._masks[k], self._kernel_constraint),
bias_constraint=self._bias_constraint,
dtype=self.dtype)(outputs[-1])
if (self._conditional and
((self._conditional_layers == 'all_layers') or
((self._conditional_layers == 'first_layer') and (k == 0)))):
conditional_output = tf.keras.layers.Dense(
layer_output_sizes[k],
activation=None,
use_bias=False,
kernel_initializer=self._kernel_initializer,
bias_initializer=None,
kernel_regularizer=self._kernel_regularizer,
bias_regularizer=None,
kernel_constraint=self._kernel_constraint,
bias_constraint=None,
dtype=self.dtype)(conditional_input)
outputs.append(tf.keras.layers.Add()([
autoregressive_output,
conditional_output]))
else:
outputs.append(autoregressive_output)
if k + 1 < len(self._masks):
outputs.append(
tf.keras.layers.Activation(self._activation)
(outputs[-1]))
self._network = tf.keras.models.Model(
inputs=inputs,
outputs=outputs[-1])
# Record that the layer has been built.
super(AutoregressiveNetwork, self).build(input_shape)
Neural net initializers tend to assume that the network inputs have elements with variance of 1 and a mean of 0 (across the set of possible inputs). The elements of your condition variable have magnitude of 10, however, so one could expect something to break, especially with the default affine pointwise transformation tfb.MaskedAutoregressiveFlow uses. I'd consider the following options:
- Normalize your condition values to be smaller, both in mean and magnitude.
- Adjust the
kernel_initializerof the entire AR network to be smaller (good idea anyway, these things train best when they're initialized to something close to an identity transformation). - We (TFP devs) probably could provide a separate kernel initializer just for the linear conditional transformation, so you could control the scaling separately.
As for the lack of configurability of the conditional transformation activation function, that's pretty typical as far as conditioning in neural nets goes: typically the conditional inputs are just added onto the pre-activation outputs of layers.