HMMBase.jl
HMMBase.jl copied to clipboard
Multiple sequence with different length
Hi!
I'm planning to use HMMBase.jl
for biological usage.
Specifically, I want to estimate a parameters (e.g. speed) of an object which has multiple states.
My data have multiple coordinate data with different length.
I am new to julia so I'm not sure if HMMBase.jl
or MS_HMMBase.jl
supports this kind of analysis.
If not, are there any future plans?
Thank you in advance! I'm sorry for a very silly question.
Hi,
Parameter estimation with multiple sequences (of potentially different lengths), is not currently supported in HMMBase
. It's something that I would like to implement (in mle.jl), but I don't know when.
I'm not sure if variable length sequences are supported by MS_HMMBase
.
I'll keep this issue open as a reminder to try to implement this :-)
Thank you for your reply :)
At least I can implement the algorithm with a specific distribution, but I don’t know how to make it work with an arbitrary distributions...
In HMMBase I make use of the Distributions.jl
package for the pdf
(likelihood) and fit_mle
(parameter estimation) methods. This way I can handle any observations distributions that implement these methods.
I only need to compute the "responsibilities" (assignments probabilities) for each observations and each states, and re-estimate the transition matrix.
See https://github.com/maxmouchet/HMMBase.jl/blob/master/src/mle.jl#L58-L68 where I delegate to fit_mle
.
Hi again, I'm now trying to integrate my code to this package.
BTW, I'm not sure what you mean by responsibilities of each observations.
By "responsibilities" I mean P(Z_t = i | Y, θ)
, where Z_t
is the hidden state at time t, Y
the observations, and θ
the model parameters.
Thank you.
I have another question.
I was thinking that we scale β
with c
calculated from α
.
https://github.com/maxmouchet/HMMBase.jl/blob/f9928525d55e06321c8b22ffcb6c179c09fc52d1/src/messages.jl#L56-L67
Do I need to calculate another c
in β
?
You're right, it's possible to use the same scaling vector c
for α
and β
.
It's not done, for now, to keep the code simple :-)
Sorry for asking many questions!
Why do we have to subtract m
from loglikelihood?
https://github.com/maxmouchet/HMMBase.jl/blob/2ebf644152de2f2af6d23995601c6faf7b3bce8d/src/mle.jl#L39-L43
https://github.com/maxmouchet/HMMBase.jl/blob/2ebf644152de2f2af6d23995601c6faf7b3bce8d/src/messages.jl#L103-L123
No worries!
This is the log-sum-exp trick : https://en.wikipedia.org/wiki/LogSumExp
It prevents exp(LL[t,j]) from overflowing.
I restarted to write a code for multiple observations. I think I can send PR soon.
- [x] rand
- [x] likelihoods
- [x] forward
- [x] backward
- [x] posterior
- [x] update_a!
- [x] update_A!
- [ ] update_B!
- [x] fit_mle
- [x] viteribi
Nice!
I recently cleaned-up the code by removing the logl
keyword and the methods that do not use the log-likelihoods.
Basically everything is done using the log-likelihoods now.
Feel free to open a PR, and I'll help you if there are merge conflicts.
Hi. I changed my codes to adapt to your new api. I implemented some new functions assuming 2 situations;
- multiple observations with same length
- multiple observations with different(random) length
I haven't finished writing codes for multivariate model in situation 2. This notebook is an example.
Is it ok to open a pr? To be honest, this is my first time using github so I'm not sure when to open it...
I think the correct URL is https://nbviewer.jupyter.org/github/SosUts/HMMBase.jl/blob/multiple_sequences/notebooks/multiple%20sequences.ipynb :)
This looks very nice! Thank you for your work :)
You can open a PR now, and I'll review the code.
It is still possible to push new commits to your branch after the PR is opened, so there is no problem to make further modifications.
I opened it. Thank you as always!
Edit: I forgot to consider about the tests. Should I change the tests, or should I close the PR and change the codes?
No worries, you can keep the PR open!
Every commit that you add to your branch will be added to the PR automatically.
I'm a bit busy this week, so I'll try to have a look at the PR this week-end, or the next one.
In any case your code looks clean :)
Thanks to your original code! I'll try fixing things step by step.
Hello! I need this feature for some data I have, where also I have multiple time-series of different lengths.
What's the current status? I saw the linked PR got closed, but I'm not sure why?
Thanks!