HMMBase.jl icon indicating copy to clipboard operation
HMMBase.jl copied to clipboard

Ideas

Open maxmouchet opened this issue 5 years ago • 2 comments

Specialized transition matrix types

  • Classical, dense, transition matrix (Array{Float64,2})
  • Left-right
  • Sparse

Implementations of forward/backward/MLE dependent on the transition matrix type.

Higher-order models

As per https://discourse.julialang.org/t/ann-hmmbase-jl-a-lightweight-and-efficient-hidden-markov-model-abstraction/21604/2:

dellison:

The reason that I’m asking is that in natural language processing, HMMs have been used for sequence labeling tasks like part-of-speech tagging. In this task, for example, a second-order HMM that conditions on the previous two tags generally seems to do better than a first-order model that just conditions on one previous tag.

maxmouchet:

I’m not very familiar with higher-order models, but right now I see two ways of implementing them:

Writing specialized versions of the functions (e.g. messages_forwards_2);
Implementing more generic algorithms (such as belief propagation) to handle models of arbitrary orders in a single function;

The second option seems cleaner but I’m worried about the performance implications. On the other hand, macros could be used to generate specialized function for an arbitrary order instead.

As for the types, I can add a new parameter like AbstractHMM{F<:VariateForm, N} where N represents the order (1 by default).

Implement serialization/deserialization

Array(HMM), Dict, JSON, ...

Others

  • Implement iterate(HMM), to be able to write forward(hmm...) instead of forward(hmm.a, hmm.A, hmm.B).
  • HMM from Mixture Model (and use that to train HMMs with Mixture Models/MMs alone)
  • MLJ integration
  • Export to Turing model
  • More examples (e.g. https://nbviewer.jupyter.org/github/jmschrei/yahmm/blob/master/examples/Global%20Sequence%20Alignment.ipynb)
  • Allow to reuse c and to compute unscaled messages (set c to 1?)
  • ~Always use logl. everyhwere? (#22)~
  • NLP/Biology examples (see https://www.nltk.org/api/nltk.tag.html#module-nltk.tag.hmm)
  • Import/Export with Turing for MCMC training
  • Turing/Neural HMM demo
  • Remove useless type annotations (see https://white.ucc.asn.au/2020/04/19/Julia-Antipatterns.html)
  • Index is broken in GH pages doc.
  • Link to source is missing in doc.
  • Document custom MLE functions (see #25 and #29)
  • Raise error when the data dimensions are incorrect in fit_mle (see #29)

maxmouchet avatar Apr 01 '19 09:04 maxmouchet

Hi @maxmouchet, and thanks for this awesome package! Has there been any progress on MLJ.jl integration?

azev77 avatar Jul 28 '20 04:07 azev77

Hi,

Thanks for the kind words ! Unfortunately, I haven't made much progress.

Currently MLJ.jl does not handle time series (see https://github.com/alan-turing-institute/MLJ.jl/issues/303).
I see that there is some early work in https://github.com/alan-turing-institute/MLJTime.jl, but I haven't yet looked into how I can adapt HMMBase to that.

I need to update the README but currently my focus for v1.1 is the support of multiple observations sequences (thanks to https://github.com/maxmouchet/HMMBase.jl/pull/26).

maxmouchet avatar Jul 28 '20 17:07 maxmouchet