hsmmlearn
hsmmlearn copied to clipboard
How to Specify Negative Binomial Distribution as Duration?
Thanks for the nice work.
In the R version of HSMM, we can specify the negative binomial distribution something like this:
hsmm(sim$obs, od = "norm", rd = "nbinom")
But according to the API, it seems only support numpy array as input...is there any easy way I can simply import the distribution from scipy.stats?
Thanks
It seems to me that the code is structured in a way that fundamentally assumes that durations are discrete. To change this, one would have to change the way durations work to mirror the way emissions work.
@yanpanlau There's currently no way to supply a scipy distribution directly as input, though I agree there should be one. You can work around this by instantiating the distribution, and getting its PMF directly (this is what the R version of the code does internally):
>>> from scipy.stats import nbinom
>>> import numpy as np
>>>
>>> x = np.arange(100) # or some (large) cutoff
>>> dist = np.vstack([
... nbinom.pmf(x, 10, 0.5),
... nbinom.pmf(x, 20, 0.5),
... nbinom.pmf(x, 40, 0.5)
])
>>> dist.shape
(3, 100)
>>> dist /= dist.sum(axis=1, keepdims=True)
>>> dist
array([[ 9.76562500e-04, 4.88281250e-03, 1.34277344e-02,
2.68554687e-02, 4.36401367e-02, 6.10961914e-02,
7.63702393e-02, 8.72802734e-02, 9.27352905e-02,
(...)
@StellaAthena Not sure I understand what the link is with discrete versus continuous distributions. Can you elaborate?
Adding direct support for distributions should be fairly straightforward.
@jvkersch I had a brain fart and thought that @yanpanlau was asking about using a continuous distribution for the durations.