pomegranate icon indicating copy to clipboard operation
pomegranate copied to clipboard

HMM state probabilities

Open malonzm1 opened this issue 2 years ago • 4 comments

Hi!

Is there a way to find the state probabilities for each step in the sequence?

Thanks and good day.

malonzm1 avatar Sep 15 '22 09:09 malonzm1

Hi, if you are talking about an HMM, you can use predict_proba(), and each component of the vector corresponds to the stateid. This uses the forward-backward algorithm. If you want to use only the forward, you can use hmm_model.forward().

teoML avatar Sep 15 '22 12:09 teoML

Thanks. I tried predict_proba() but when I took a look at the selected states they don't necessarily correspond to the highest probabilities (though most do). What am I missing?

Thanks.

malonzm1 avatar Sep 16 '22 06:09 malonzm1

I'm not sure if you're saying that the selected states from model.predict don't match the highest probabilities from model.predict_proba, or if the highest probability states in model.predict_proba don't match the highest probability states in model.forward, so I'll answer both.

First, the forward algorithm begins at the start of the sequence, aligning observations to states in the model. Each probability in the returned matrix is the probability of starting at the beginning of the sequencing and aligning observations to any state in the model, over any path through the model, to eventually align this observation to this state. The backward algorithm works much the same way, except it begins by aligning the final observation to the end state and goes backwards from there. The forward_backward algorithm, wrapped by predict_proba, combines these probabilities and then normalizes them per-observation. It's basically saying, "given all paths of aligning observations to states up until this point, and all paths aligning observations to states after this point, what state is most likely for this observation?" It has information that the forward algorithm does not have access to.

Second, the algorithm in model.predict is the Viterbi algorithm, which is returning the maximum likelihood single path through the model, whereas model.predict_proba is returning probabilities from the forward-backward algorithm.

jmschrei avatar Sep 17 '22 07:09 jmschrei

Thanks! It is the first, the selected states from model.predict don't match the highest probabilities from model.predict_proba. I don't suppose there's a similar model.predict_proba function for model.predict that outputs state probabilities for the Viterbi algorithm?

malonzm1 avatar Sep 21 '22 05:09 malonzm1

Thank you for opening an issue. pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython (v1.0.0), and so all issues are being closed as they are likely out of date. Please re-open or start a new issue if a related issue is still present in the new codebase.

jmschrei avatar Apr 16 '23 06:04 jmschrei