edward Attempting basic hidden Markov model

Attempting basic hidden Markov model

Open eamartin opened this issue 7 years ago • 13 comments

I'm trying to implement an HMM to learn my way around Edward. I'd be happy to contribute this as an example (https://github.com/blei-lab/edward/issues/22) once I get it working.

I'm trying to do inference on the hidden -> hidden transition matrix, the hidden -> observed emission matrix, and the hidden states themselves from observed data.

My current attempt:

import numpy as np
import tensorflow as tf
import edward as ed
from edward.models import Categorical, Uniform

chain_len = 30
n_hidden = 3
n_obs = 3
x_0 = Categorical(p=tf.fill([n_hidden], 1.0 / n_hidden))

# transition matrix
T = Uniform(tf.zeros([n_hidden, n_hidden]), tf.ones([n_hidden, n_hidden]))
T /= tf.reduce_sum(T, axis=0, keep_dims=True)

# emission matrix
E = Uniform(tf.zeros([n_obs, n_hidden]), tf.ones([n_obs, n_hidden]))
E /= tf.reduce_sum(E, axis=0, keep_dims=True)

# model
y_val = tf.placeholder(tf.int32, [chain_len])
x = tf.scan(lambda x_tm1, _: Categorical(p=T[:, x_tm1]),
            y_val, initializer=x_0)
y = tf.map_fn(lambda x_t: Categorical(p=E[:, x_t]), x)

# inference
qT = tf.Variable(Uniform(tf.zeros([n_hidden, n_hidden]), tf.ones([n_hidden, n_hidden])))
qT /= tf.reduce_sum(qT, axis=0, keep_dims=True)
qE = tf.Variable(Uniform(tf.zeros([n_obs, n_hidden]), tf.ones([n_obs, n_hidden])))
qE /= tf.reduce_sum(qE, axis=0, keep_dims=True)
px_init = tf.Variable(tf.random_uniform([chain_len, n_hidden]))
px_init /= tf.reduce_sum(px_init, axis=0, keep_dims=True)
qx = Categorical(p=px_init)

y_data = np.array(([0] * 10) + ([1] * 10) + ([2] * 10), dtype=np.int32)

inference = ed.KLqp({T: qT, E: qE, x: qx}, {y_val: y_data})
inference.run()

This fails with

Traceback (most recent call last):
  File "hmm.py", line 36, in <module>
    inference = ed.KLqp({T: qT, E: qE, x: qx}, {y_val: y_data})
  File "/env/src/edward/edward/inferences/klqp.py", line 54, in __init__
    super(KLqp, self).__init__(*args, **kwargs)
  File "/env/src/edward/edward/inferences/variational_inference.py", line 23, in __init__
    super(VariationalInference, self).__init__(*args, **kwargs)
  File "/env/src/edward/edward/inferences/inference.py", line 103, in __init__
    raise TypeError("Latent variable value has an invalid type.")
TypeError: Latent variable value has an invalid type.

From inspecting the Edward source, I can see I'm going to have problems as long as T, qT, E, qE,x, and y aren't instances of RandomVariable. The first four of these instances aren't RandomVariable instances because I couldn't figure out how to make sure a random variable will always be a valid transition matrix. x and y aren't RandomVariables because type information is lost in scan and map_fn.

Does anyone have any suggestions for how to work around these problems?

Feb 10 '17 01:02 eamartin

I tried two different routes to fix this:

(1) make T and E a list of Categorical(Dirichlet(...)), and replace loop bodies to return a Mixture of T and E according to x_t or x_tm1. This didn't work because x_t lost it's type information in the loop and caused error TypeError: cat must be a Categorical distribution, but saw: Tensor("scan/while/Identity_1:0", shape=(), dtype=int32)

(2) Converted T and E to just be tf.Variables, and then just do inference on x while feeding in y_val. This failed because x still isn't a RandomVariable.

Feb 10 '17 02:02 eamartin

I made approach (2) work by changing the TensorFlow loop into a Python loop. This fixes a chain length into the graph, but that seems necessary as Edward uses separate Variables to infer each latent variable in the chain.

I believe converting TF loop into Python loop would also make approach 1 work. I also remembered that softmax is a very easy way to make valid transition matrices.

Here's my current code (which runs, but doesn't give the results I expect)

import numpy as np
import tensorflow as tf
import edward as ed
from edward.models import Categorical, Dirichlet, Uniform, Mixture

chain_len = 30
n_hidden = 3
n_obs = 3

x_0 = Categorical(p=tf.fill([n_hidden], 1.0 / n_hidden))

# transition matrix
T = tf.nn.softmax(tf.Variable(tf.random_uniform([n_hidden, n_hidden])), dim=0)

# emission matrix
E = tf.nn.softmax(tf.Variable(tf.random_uniform([n_obs, n_hidden])), dim=0)

# model
x = []
y = []
for _ in xrange(chain_len):
    x_tm1 = x[-1] if x else x_0
    x_t = Categorical(p=T[:, x_tm1])
    y_t = Categorical(p=E[:, x_t])
    x.append(x_t)
    y.append(y_t)

# inference
qx = [Categorical(p=tf.nn.softmax(tf.Variable(tf.ones(n_hidden))))
      for _ in xrange(chain_len)]

y_data = ([0] * 10) + ([1] * 10) + ([2] * 10)
y_data = map(np.array, y_data)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print sess.run(T)

    inference = ed.KLqp(dict(zip(x, qx)), dict(zip(y, y_data)))
    inference.run(n_iter=20000)

    print sess.run(T)
    print sess.run([foo.p for foo in qx])
    print sess.run([foo.p for foo in y])

Given my 3 hidden states, 3 observation types, and long changes of identical observations, I expect transition matrix to be close to diagonal and the emission matrix to look like a permutation matrix. I see non-converging loss info

Iteration     1 [  0%]: Loss = 35.123
Iteration  2000 [ 10%]: Loss = 46.565
Iteration  4000 [ 20%]: Loss = 47.923
Iteration  6000 [ 30%]: Loss = 56.581
Iteration  8000 [ 40%]: Loss = 53.431
Iteration 10000 [ 50%]: Loss = 49.397
Iteration 12000 [ 60%]: Loss = 55.551
Iteration 14000 [ 70%]: Loss = 47.537
Iteration 16000 [ 80%]: Loss = 47.690
Iteration 18000 [ 90%]: Loss = 47.107
Iteration 20000 [100%]: Loss = 38.514

, non-uniform state probability distributions, and very uniform observation probabilities. Any idea what's going wrong in my setup or the solving of the problem?

Feb 10 '17 17:02 eamartin

I've been thinking about and talking with @dustinvtran about how to handle time series models in edward, including HMMs. Currently, I'm not quite sure how much benefit there is to be gained from implementing HMMs as a list of Categorical variables, nor how much benefit there is to having the hidden states be backed by tensorflow.

Maybe a clean way to deal with HMMs (and maybe Gaussian hidden states as well) would be to have an edward.state_space module that contains a representation of the hidden state sequence for some observations and implements message passing on the chain via dynamic programming (could be done in cython to handle large-ish sequences). The messages at each node can then be used to compute the usual quantities of interest (pairwise and marginal distributions of the latent state) to use in local-global inference algorithms.

Alternatively, for HMMs at least one can bypass all of this completely and integrate out the latent state and work directly with p(y_{1:T} | theta) for which tensorflow could be very efficient.

Also looping in @dustinvtran to get his thoughts. I am planning on putting in more time on this after the ICML deadline.

Feb 20 '17 19:02 nfoti

Glad this is getting some attention :) Beyond basic HMM, it would also be nice to support some variations where the transition or emission operator is conditioned on some (detererministic, learnable) parameters theta_t. Another type of sequential stochastic model to consider when thinking about sequences in Edward is https://arxiv.org/abs/1605.07571 .

I'm just learning about Bayesian inference/learning algorithms myself, so I don't know enough yet to contribute very much to design. I am very familiar with TensorFlow + numerical Python/Cython and can help with implementation.

Feb 21 '17 02:02 eamartin

i'll have a lot more time to contribute to this after friday's icml deadline as well. :) (for example, finally finishing up conjugacy)

Feb 22 '17 03:02 dustinvtran

Has anyone made progress on HMM implementation post-conjugacy? Would be happy to contribute!

Apr 06 '17 16:04 jengelman

There's this on the Edward forum. Conjugacy hasn't been merged in yet (#588), so that's what personally been delaying my thoughts on the matter. I haven't figured out how exact marginals like the forward backward in the link might fit in.

Apr 06 '17 16:04 dustinvtran

I haven't had a lot of time to work on it lately. However, my view on the forward backward algorithm is to use good implementations in other packages. For instance, @mattjj has some great packages with message passing algorithms for both hmms/hsmms and LDSs. I don't see a good reason to reinvent the wheel for these algorithms. The fact that inference with edward is composable means it's trivial to use the methods from external packages. On Thu, Apr 6, 2017 at 9:58 AM Dustin Tran [email protected] wrote:

There's this on the Edward forum https://discourse.edwardlib.org/t/a-simple-tensorflow-implementation-of-forward-backward/67. Conjugacy hasn't been merged in, so that's what personally been delaying my thoughts on the matter. I haven't figured out how exact marginals like the forward backward in the link might fit in.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blei-lab/edward/issues/450#issuecomment-292237885, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXvSLhtCfhxFB0Xnx3dR7JAspQiiduTks5rtRnLgaJpZM4L84GX .

Apr 07 '17 00:04 nfoti

I've actually been looking at @mattjj 's work, and was considering Edward as an alternative to take advantage of running distributed or "Hogwild" SVI using my GPU with tensorflow. Looking at the Message Passing Algorithms example, would I be able to replace ed.KLpq with an external inference engine (like that from pyhsmm)?

Apr 07 '17 06:04 jengelman

Two options for integration are by: (1) wrapping pyhsmm algs in a pyfunc, or more simply, (2) defining the parameters you'll learn with pyhsmm as tf.placeholders in the Edward model. You can feed these placeholders the values you learn from pyhsmm during training.

Of course, one caveat is that pyhsmm is written with cython, so your mileage will vary depending on what updates you'd like to make distributed/"hogwild".

Apr 08 '17 16:04 dustinvtran

I've been thinking about this in terms of option 2). I agree distributing the existing code to the GPU could be difficult. On Sat, Apr 8, 2017 at 9:58 AM Dustin Tran [email protected] wrote:

Two options for integration are either by: (1) wrapping pyhsmm algs in a pyfunc, or more simply, (2) defining the parameters you'll learn with pyhsmm as tf.placeholders in the Edward model. You can feed these placeholders the values you learn from pyhsmm during training. Of course, one caveat is that pyhsmm is written with cython, so your mileage will vary depending on what updates you'd like to make distributed/"hogwild".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blei-lab/edward/issues/450#issuecomment-292730953, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXvSHgfnO19PiCuVq8m8ToHTUxzSdpnks5rt7y9gaJpZM4L84GX .

Apr 08 '17 18:04 nfoti

Any update on this stuff?

Apr 18 '18 15:04 refrigerator

Does this work now?

Aug 18 '18 00:08 rzu512

edward edward copied to clipboard

Attempting basic hidden Markov model

edward
edward copied to clipboard