IOHMM
IOHMM copied to clipboard
how to do prediction?
How to do prediction for new test data after trained the model?
I have the same question. Did you solve it?
How to do prediction for new test data after trained the model?
I have the same question. Did you solve it?
Referring to "https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf" and library "https://hmmlearn.readthedocs.io/en/latest/" I have found this solution:
1- Through log_gamma (posterior distribution):
state_sequences = []
for i in range(100):
for j in range(lengths[i]):
state_sequences.append(np.argmax(np.exp(SHMM.log_gammas[i])[j]))
pred_state_seq = [state_sequences[df[df['unit'] == i].index[0]:df[df['unit'] == i].index[-1] + 1] for i in
range(1, df_A['unit'].max() + 1)]
2- Viterbi Algorithm:
from hmmlearn import _hmmc
transmat = np.empty((num_states, num_states))
for i in range(num_states):
transmat = np.concatenate((transmat, np.exp(SHMM.model_transition[i].predict_log_proba(np.array([[]])))))
transmat = transmat[num_states:]
startprob = np.exp(SHMM.model_initial.predict_log_proba(np.array([[]]))).squeeze()
def log_mask_zero(a):
"""
Compute the log of input probabilities masking divide by zero in log.
Notes
-----
During the M-step of EM-algorithm, very small intermediate start
or transition probabilities could be normalized to zero, causing a
*RuntimeWarning: divide by zero encountered in log*.
This function masks this unharmful warning.
"""
a = np.asarray(a)
with np.errstate(divide="ignore"):
return np.log(a)
def _do_viterbi_pass(framelogprob):
n_samples, n_components = framelogprob.shape
state_sequence, logprob = _hmmc._viterbi(n_samples, n_components, log_mask_zero(startprob),
log_mask_zero(transmat), framelogprob)
return logprob, state_sequence
def _decode_viterbi(X):
framelogprob = SHMM.log_Eys[X]
return _do_viterbi_pass(framelogprob)
def decode():
decoder = {"viterbi": _decode_viterbi}["viterbi"]
logprob = 0
sub_state_sequences = []
for sub_X in range(100):
# XXX decoder works on a single sample at a time!
sub_logprob, sub_state_sequence = decoder(sub_X)
logprob += sub_logprob
sub_state_sequences.append(sub_state_sequence)
return logprob, np.concatenate(sub_state_sequences)
def predict():
"""
Find most likely state sequence corresponding to ``X``.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Feature matrix of individual samples.
lengths : array-like of integers, shape (n_sequences, ), optional
Lengths of the individual sequences in ``X``. The sum of
these should be ``n_samples``.
Returns
-------
state_sequence : array, shape (n_samples, )
Labels for each sample from ``X``.
"""
logprob, state_sequence = decode()
return logprob, state_sequence
_, state_seq = predict()
pred_state_seq = [state_seq[df[df['unit'] == i].index[0]:df[df['unit'] == i].index[-1] + 1] for i in
range(1, df_A['unit'].max() + 1)]
Referring to "https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf" and library "https://hmmlearn.readthedocs.io/en/latest/" I have found this solution:
1- Through log_gamma (posterior distribution):
state_sequences = [] for i in range(100): for j in range(lengths[i]): state_sequences.append(np.argmax(np.exp(SHMM.log_gammas[i])[j])) pred_state_seq = [state_sequences[df[df['unit'] == i].index[0]:df[df['unit'] == i].index[-1] + 1] for i in range(1, df_A['unit'].max() + 1)]
2- Viterbi Algorithm:
from hmmlearn import _hmmc transmat = np.empty((num_states, num_states)) for i in range(num_states): transmat = np.concatenate((transmat, np.exp(SHMM.model_transition[i].predict_log_proba(np.array([[]]))))) transmat = transmat[num_states:] startprob = np.exp(SHMM.model_initial.predict_log_proba(np.array([[]]))).squeeze() def log_mask_zero(a): """ Compute the log of input probabilities masking divide by zero in log. Notes ----- During the M-step of EM-algorithm, very small intermediate start or transition probabilities could be normalized to zero, causing a *RuntimeWarning: divide by zero encountered in log*. This function masks this unharmful warning. """ a = np.asarray(a) with np.errstate(divide="ignore"): return np.log(a) def _do_viterbi_pass(framelogprob): n_samples, n_components = framelogprob.shape state_sequence, logprob = _hmmc._viterbi(n_samples, n_components, log_mask_zero(startprob), log_mask_zero(transmat), framelogprob) return logprob, state_sequence def _decode_viterbi(X): framelogprob = SHMM.log_Eys[X] return _do_viterbi_pass(framelogprob) def decode(): decoder = {"viterbi": _decode_viterbi}["viterbi"] logprob = 0 sub_state_sequences = [] for sub_X in range(100): # XXX decoder works on a single sample at a time! sub_logprob, sub_state_sequence = decoder(sub_X) logprob += sub_logprob sub_state_sequences.append(sub_state_sequence) return logprob, np.concatenate(sub_state_sequences) def predict(): """ Find most likely state sequence corresponding to ``X``. Parameters ---------- X : array-like, shape (n_samples, n_features) Feature matrix of individual samples. lengths : array-like of integers, shape (n_sequences, ), optional Lengths of the individual sequences in ``X``. The sum of these should be ``n_samples``. Returns ------- state_sequence : array, shape (n_samples, ) Labels for each sample from ``X``. """ logprob, state_sequence = decode() return logprob, state_sequence _, state_seq = predict() pred_state_seq = [state_seq[df[df['unit'] == i].index[0]:df[df['unit'] == i].index[-1] + 1] for i in range(1, df_A['unit'].max() + 1)]
building on top of this, using log-gamma can decode the sequence of hidden states.
To fit testing data, one could set the data using the testing set and re-run the E-step to get a new set of log-gammas. (this does not update the transitions and emissions, so would be still using the trained transitions and emissions). Using these new log-gammas, re-run the decoding function as above.
USHMM.set_data([testing]) #this initializes the new set of log-gammas
USHMM.E_step() #fitting the log-gammas to the testing set