pomegranate
pomegranate copied to clipboard
Added evidence-based Gibbs sampling to Bayesian Networks
Although the code could be made more efficient by leveraging numpy, this basic implementation of Gibbs sampling works adequately for plenty of purposes.
Thanks for the contribution. I'm trying to understand the code now. I'll get back to you soon about this PR.
It would also be helpful if you could add some unit tests to ensure that sampling (with a fixed seed) gives the expected behavior.
Hi,
Some colleagues and I are really interested in this and we are willing to contribute. I tried the pull requests and noticed breaking issues that need correcting and testing.
Since it has been proposed several month ago, we would like to propose a new pull requests. Is that fine with every one ?
That's fine with me. If you could add unit tests and appropriate documentation on your additions as well that would be great.
Ok, I have a fisrt implementation using rejection sampling in order to take into account evidence on non marginal nodes.
I have resulted to use networkx topological sort to order the nodes. I didn't manged to find it else where in pomegranate, did I missed something ?
I'll make a pull requests once tests are in place.
As for API, this is what I have for now, let me know if this depart too much from what you had in mind :
def sample(self, n=1, evidences=[{}],min_prob=0.01):
"""Sample the network, optionally given some evidences
Use rejection to condition on non marginal nodes
Parameters
----------
n : int, optional
The number of samples to generate. Defaults to 1.
evidences : list of dict, optional
Evidence to set constant while samples are generated.
min_prob : stop iterations when Sum P(X|Evidence) < min_prob. generated samples for a given evidence will be
incomplete (<n)
Returns
-------
a nested list of sampled states
"""
```
Ok, so rejection sampling was easy.
Gibbs sampling, on the other hand require a little bit more work to build the transition kernel of the markov chains.
Nothing impossible, but I stumbled upon an issue when data do not contains all possible combination. This break distribution.joint() and marginal()
The issue seems to be adressed by https://github.com/jmschrei/pomegranate/pull/680
Do you know if/when this could be merged ?
Edit : nevermind, using model.graph.state.distribution instead of model.state.distribution did the trick without having to re-estimate marginals.
What is the status of this? Are you looking into adding Gibbs sampling or do you want to move forward with this addition? If so, can you add unit tests?
I started with rejection sampling wich is easy to implement but quite slow as soon as evidences are specified on non root nodes, due to a large rate of rejection. This is working for 2 weeks in my fork, but tests still need to be implemented.
If you want to look : https://github.com/cstb/pomegranate
I also started to implement gibbs sampling, which is a bit more involved than rejection sampling. I tried a couple of unsuccesfull approches, but I am happy to report that I am finally getting there, and I expect to submit a pull requests in the next couple of week.
I'm currently comparing results with the rejection sampler to make sure everything is ok.
I still need to work on seed management as I don't rely on numpy for weigthed choice but on a cython function.
On a 12 nodes network with an average of 2-3 parent by node, the Gibbs sampler takes 150 ms to draw 1000 samples without evidence. The more evidence, the faster it goes. The algorithm execution time is roughly linear with
- the number of samples
- the number of unknown nodes
- the number of edges in the markov blanket of unknown nodes
So far, on the network I tried, I did not observe the need to dicards burn in samples, but it is a bit early to be definitive.
By the way, I should probably have not relied on that assumption, but is there a good reason why distribution.keys() and keys in the distribution.keymap are sometimes ordered the same, and sometimes they are not ?
Hi @pascal-schetelat Firstly, many thanks for this contribution.
Is the gibbs sampler for BN's has an inplementation yet?
I just made the PR here : https://github.com/jmschrei/pomegranate/pull/778
Great! @pascal-schetelat
Looking foward to using it :)
Thank you for your contribution. However, pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython, and so this PR is unfortunately out of date and will be closed.