edward icon indicating copy to clipboard operation
edward copied to clipboard

Edward crashes on some values of n_steps

Open lazypanda1 opened this issue 7 years ago • 2 comments

When I run this program on Edward with certain values of n_steps, it crashes (error message below). When n_steps < 7, the program does not crash. I was trying to tune HMC to get some output, but I ran into crashes when n_steps >= 7. Is a crash expected here? If there is something wrong with the model, can edward produce some more meaningful messages?

import edward as ed, tensorflow as tf, numpy as np
ed.set_seed(66585)
datax = np.array([23.79120344627258, 81.77398173144337, 96.96636792410925, 85.02141658660774, 35.12440569598619, 56.182053744711645, 65.25732396608474, 5.617963797883707])
datay = np.array([24.79120344627258, 82.77398173144337, 97.96636792410925, 86.02141658660774, 36.12440569598619, 57.182053744711645, 66.25732396608474, 6.617963797883707])
datax = datax.reshape((8, 1))
X = tf.placeholder(tf.float32, [8, 1])
p1 = ed.models.Gamma((tf.ones(1) * 55.36914775933779), (tf.ones(1) * 91.15324589106014))
p2 = ed.models.Exponential((tf.ones(1) * 84.52276002183024))
y = ed.models.Exponential((ed.dot(X, p251) + p252))
qw = ed.models.Empirical(params=tf.Variable(tf.random_normal([5000, 1])))
qb = ed.models.Empirical(params=tf.Variable(tf.random_normal([5000, 1])))
inference = ed.HMC({p1: qw, p2: qb}, data={X: datax, y: datay})
inference.run(step_size=0.04, n_steps=7)
print(qw.params.eval()[1000:5000:10].mean())
print(qb.params.eval()[1000:5000:10].mean())

Error message:

/usr/local/bin/python3.6 /home/edward/test.py
2018-03-02 12:38:10.938599: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py:52: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  not np.issubdtype(value.dtype, np.float) and \
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError:  : Tensor had NaN values
	 [[Node: inference/sample_7/VerifyFinite_1/CheckNumerics = CheckNumerics[T=DT_FLOAT, _class=["loc:@Gamma/sample/Reshape"], message="", _device="/job:localhost/replica:0/task:0/device:CPU:0"](invert_softplus_30/inverse/Softplus)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/edward/test.py", line 15, in <module>
    inference.run(step_size=0.04, n_steps=7)
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/inference.py", line 146, in run
    info_dict = self.update()
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/monte_carlo.py", line 138, in update
    _, accept_rate = sess.run([self.train, self.n_accept_over_t], feed_dict)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  : Tensor had NaN values
	 [[Node: inference/sample_7/VerifyFinite_1/CheckNumerics = CheckNumerics[T=DT_FLOAT, _class=["loc:@Gamma/sample/Reshape"], message="", _device="/job:localhost/replica:0/task:0/device:CPU:0"](invert_softplus_30/inverse/Softplus)]]

Caused by op 'inference/sample_7/VerifyFinite_1/CheckNumerics', defined at:
  File "/home/edward/test.py", line 15, in <module>
    inference.run(step_size=0.04, n_steps=7)
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/inference.py", line 125, in run
    self.initialize(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/hmc.py", line 64, in initialize
    return super(HMC, self).initialize(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/monte_carlo.py", line 101, in initialize
    self.train = self.build_update()
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/hmc.py", line 104, in build_update
    self.n_steps)
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/hmc.py", line 214, in leapfrog
    grad_log_joint = tf.gradients(log_joint(z_new), list(six.itervalues(z_new)))
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/hmc.py", line 167, in _log_joint_unconstrained
    return self._log_joint(z_sample_transformed) + log_det_jacobian
  File "/usr/local/lib/python3.6/site-packages/edward/inferences/hmc.py", line 197, in _log_joint
    x_copy = copy(x, dict_swap, scope=scope)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 237, in copy
    for arg in rv._args]
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 237, in <listcomp>
    for arg in rv._args]
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 88, in _copy_default
    x = copy(x, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 270, in copy
    new_op = copy(op, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 334, in copy
    elem = copy(x, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 270, in copy
    new_op = copy(op, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 334, in copy
    elem = copy(x, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 270, in copy
    new_op = copy(op, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 334, in copy
    elem = copy(x, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 270, in copy
    new_op = copy(op, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 324, in copy
    elem = copy(x, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 324, in copy
    elem = copy(x, dict_swap, scope, True, copy_q, False)
  File "/usr/local/lib/python3.6/site-packages/edward/util/random_variables.py", line 316, in copy
    op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback):  : Tensor had NaN values
	 [[Node: inference/sample_7/VerifyFinite_1/CheckNumerics = CheckNumerics[T=DT_FLOAT, _class=["loc:@Gamma/sample/Reshape"], message="", _device="/job:localhost/replica:0/task:0/device:CPU:0"](invert_softplus_30/inverse/Softplus)]]


Process finished with exit code 1

Environment:

python 3.6.2
edward 1.3.5

lazypanda1 avatar Mar 06 '18 02:03 lazypanda1

Likes a NaN when using HMC applied to the inverse-softplus transform of Gamma. The inverse-softplus unconstrains the Gamma distribution, which enables proper exploration. However, at close enough values to 0, the inverse-softplus fails.

dustinvtran avatar Mar 06 '18 07:03 dustinvtran

So, is this a bug? Is there a better way to handle this in edward? I can help contribute if you have some suggestions.

lazypanda1 avatar Mar 15 '18 17:03 lazypanda1