pomegranate icon indicating copy to clipboard operation
pomegranate copied to clipboard

Would it be a desirable feature allowing train HMM with soft labels?

Open complyue opened this issue 8 years ago • 4 comments

What's in my mind is to replace forward/backward pass with the same algorithm, but using different observation values (may be called futuristic values, since they are yet to be observed at their respective time positions), with predefined emission distributions, thus trained HMM will be used for sort of predictions.

If I understand it correctly, pomegranate implemented hmm.sumarize(algorithm='labeled') with part of the aim to support hmm.sumarize(algorithm='viterbi'), however I didn't see viterbi training being a common supported feature in other HMM packages, even less seemingly for labeled training, so I think pomegranate just advanced in this direction.

However current implementation of viterbi/labeled training assumes hard labels, I think soft labels is essential to achieve my stated goals, if their won't be a shadow training algorithm that take parallel observation data together with another HMM sharing all states with the HMM being trained, but with different (maybe frozen) emission distributions.

I'd like to poll your thoughts and experiences (which will be greatest), regarding this idea.

complyue avatar Feb 21 '17 12:02 complyue

I'm unsure what you mean by 'soft labels', but HMMs are usually trained using the Baum-Welch algorithm, which can be thought of as a temporal/structured version of EM. 'baum-welch' is the default and does not require labels at all. Can you define a soft label in this context exactly?

jmschrei avatar Feb 21 '17 17:02 jmschrei

'soft labels' here I mean, for each given temporal point, provide the label as a vector of probabilities for each states, instead of a definite state index, which I sort of think should be called 'hard label'.

complyue avatar Feb 22 '17 04:02 complyue

This is called Partial HMM. I am interested in such a model too. See my question here: https://github.com/hmmlearn/hmmlearn/issues/173 The answer was pointing to this paper on partially observed HMMs.

chananshgong avatar Feb 23 '17 09:02 chananshgong

I think it would be a nice feature to add. It doesn't seem like it's that difficult to add either-- simply provide a matrix W which is size observations x states where you input the prior probability of an observation belonging to that state, and multiply by that in the FB algorithm. Is there a more intuitive way to specify the prior weights than a giant matrix though?

jmschrei avatar Feb 23 '17 17:02 jmschrei

Thank you for opening an issue. pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython (v1.0.0), and so all issues are being closed as they are likely out of date. Please re-open or start a new issue if a related issue is still present in the new codebase.

jmschrei avatar Apr 16 '23 06:04 jmschrei