HMMBase.jl icon indicating copy to clipboard operation
HMMBase.jl copied to clipboard

Multiple sequence with different length

Open sosuts opened this issue 5 years ago • 17 comments

Hi! I'm planning to use HMMBase.jl for biological usage. Specifically, I want to estimate a parameters (e.g. speed) of an object which has multiple states. My data have multiple coordinate data with different length.

I am new to julia so I'm not sure if HMMBase.jl or MS_HMMBase.jl supports this kind of analysis. If not, are there any future plans?

Thank you in advance! I'm sorry for a very silly question.

sosuts avatar Feb 26 '20 10:02 sosuts

Hi,

Parameter estimation with multiple sequences (of potentially different lengths), is not currently supported in HMMBase. It's something that I would like to implement (in mle.jl), but I don't know when.

I'm not sure if variable length sequences are supported by MS_HMMBase.

I'll keep this issue open as a reminder to try to implement this :-)

maxmouchet avatar Feb 26 '20 15:02 maxmouchet

Thank you for your reply :)

At least I can implement the algorithm with a specific distribution, but I don’t know how to make it work with an arbitrary distributions...

sosuts avatar Feb 26 '20 15:02 sosuts

In HMMBase I make use of the Distributions.jl package for the pdf (likelihood) and fit_mle (parameter estimation) methods. This way I can handle any observations distributions that implement these methods.
I only need to compute the "responsibilities" (assignments probabilities) for each observations and each states, and re-estimate the transition matrix.
See https://github.com/maxmouchet/HMMBase.jl/blob/master/src/mle.jl#L58-L68 where I delegate to fit_mle.

maxmouchet avatar Feb 26 '20 16:02 maxmouchet

Hi again, I'm now trying to integrate my code to this package.

BTW, I'm not sure what you mean by responsibilities of each observations.

sosuts avatar Apr 06 '20 09:04 sosuts

By "responsibilities" I mean P(Z_t = i | Y, θ), where Z_t is the hidden state at time t, Y the observations, and θ the model parameters.

maxmouchet avatar Apr 06 '20 09:04 maxmouchet

Thank you. I have another question. I was thinking that we scale β with c calculated from α. https://github.com/maxmouchet/HMMBase.jl/blob/f9928525d55e06321c8b22ffcb6c179c09fc52d1/src/messages.jl#L56-L67 Do I need to calculate another c in β?

sosuts avatar Apr 10 '20 02:04 sosuts

You're right, it's possible to use the same scaling vector c for α and β. It's not done, for now, to keep the code simple :-)

maxmouchet avatar Apr 10 '20 10:04 maxmouchet

Sorry for asking many questions! Why do we have to subtract m from loglikelihood?

https://github.com/maxmouchet/HMMBase.jl/blob/2ebf644152de2f2af6d23995601c6faf7b3bce8d/src/mle.jl#L39-L43

https://github.com/maxmouchet/HMMBase.jl/blob/2ebf644152de2f2af6d23995601c6faf7b3bce8d/src/messages.jl#L103-L123

sosuts avatar May 01 '20 08:05 sosuts

No worries!

This is the log-sum-exp trick : https://en.wikipedia.org/wiki/LogSumExp
It prevents exp(LL[t,j]) from overflowing.

maxmouchet avatar May 04 '20 16:05 maxmouchet

I restarted to write a code for multiple observations. I think I can send PR soon.

  • [x] rand
  • [x] likelihoods
  • [x] forward
  • [x] backward
  • [x] posterior
  • [x] update_a!
  • [x] update_A!
  • [ ] update_B!
  • [x] fit_mle
  • [x] viteribi

sosuts avatar Jun 13 '20 07:06 sosuts

Nice!

I recently cleaned-up the code by removing the logl keyword and the methods that do not use the log-likelihoods. Basically everything is done using the log-likelihoods now.

Feel free to open a PR, and I'll help you if there are merge conflicts.

maxmouchet avatar Jun 15 '20 12:06 maxmouchet

Hi. I changed my codes to adapt to your new api. I implemented some new functions assuming 2 situations;

  1. multiple observations with same length
  2. multiple observations with different(random) length

I haven't finished writing codes for multivariate model in situation 2. This notebook is an example.

Is it ok to open a pr? To be honest, this is my first time using github so I'm not sure when to open it...

sosuts avatar Jul 13 '20 09:07 sosuts

I think the correct URL is https://nbviewer.jupyter.org/github/SosUts/HMMBase.jl/blob/multiple_sequences/notebooks/multiple%20sequences.ipynb :)

This looks very nice! Thank you for your work :)

You can open a PR now, and I'll review the code.
It is still possible to push new commits to your branch after the PR is opened, so there is no problem to make further modifications.

maxmouchet avatar Jul 14 '20 13:07 maxmouchet

I opened it. Thank you as always!

Edit: I forgot to consider about the tests. Should I change the tests, or should I close the PR and change the codes?

sosuts avatar Jul 15 '20 08:07 sosuts

No worries, you can keep the PR open!
Every commit that you add to your branch will be added to the PR automatically.

I'm a bit busy this week, so I'll try to have a look at the PR this week-end, or the next one.
In any case your code looks clean :)

maxmouchet avatar Jul 16 '20 13:07 maxmouchet

Thanks to your original code! I'll try fixing things step by step.

sosuts avatar Jul 17 '20 08:07 sosuts

Hello! I need this feature for some data I have, where also I have multiple time-series of different lengths.

What's the current status? I saw the linked PR got closed, but I'm not sure why?

Thanks!

cossio avatar Jan 06 '23 18:01 cossio