Integrate `pgmpy` for Bayesian networks capabilities
Hey @ceteri,
I need some pointers to understand this requirement better.
Thanks in advance.
Thank you @Ankush-Chander! Here's an idea, if this seems reasonable as an approach?
There are several kinds of modeling, sampling, and inference implemented by pgmpy, although probably our shortest path is for focusing on Discrete Bayesian? This is also one of the top-requested features to add to kglab from our ongoing survey.
Next steps are:
- Build an example Discrete Bayesian model in
pgmpywhich produces known results – which we can use to verify the integration later- for example, using one of the examples given in their documentation
- or, ideally, based on data in the recipe progressive example that we use
- Represent the data from this model in an RDF graph
- Develop a new class method for
kglab.KnowledgeGraphor probably even better forkglab.Subgraphthat loads thepgmpymodel data from the KG - Verify results from above, to use as a unit test
We can also decide whether to have some additional wrappers for pgmpy and its results. On the one hand, it's great to wrap results into pandas dataframes and other conveniences for data science workflows. On the other hand, it's probably better to allow people to simply use pgmpy operations on the model directly. The latter approach is how we've handled integration of PyTorch, PyVis, etc., i.e., not to intermediate unless there are pain points that need to be corrected (as in SPARQL queries).
How does that sound as an approach?
Hey @ceteri
I tried to follow above trail but I was not able to find any widely accepted standard rdf representation of bayesian networks. Will need your help in that.
Once we pinpoint that we can provide user a pathway to move from a standard bn rdf file to kg to pgmpy model. Rest of the operation can be done directly using pgmpy endpoints.
Thanks
Hi @Ankush-Chander, good point! The way I described it above, moving from RDF => pgmpy wouldn't work directly, and there's not standard representation.
What I should have described better:
- Choose a simple example Bayesian network problem
- Build a solution for it in
pgmpy, so we have a known baseline to test against - At that point, I'll represent in RDF (as idiomatic as possible; this becomes simpler after RDF-star is available)
- Then we can scope how best to use the
Subgraphclasses to transform intopgmpy
If the selected example problem can involve the "progressive example" of recipes used in the tutorial, that would be ideal. Although that's not necessary first for us to build out an integration. The initial test case should be simple, as the priority. We can always construct recipe examples later :)
Does that describe the problem better?
The intention for this is to illustrate how to use a completely different graph technology (Bayesian networks) on graph data, which can complement the other approaches we have with NetworkX, RDFlib, pslpython, PyTorch, etc.
Many thanks, Paco
Hey @ceteri,
Took a while to get my head around Bayesian inferencing.
Here"s the test example.
P.S: Original cancer model although simple made some very gloomy assumptions, so I had to choose something positive :) I hope it"s simple enough for our purpose
3. At that point, I'll represent in RDF (as idiomatic as possible; this becomes simpler after RDF-star is available)
Any pointers on step 3 will be helpful for me to continue.
Thanks in advance, Ankush
Wonderful, thank you @Ankush-Chander !
Now I get to wrangle with some RDF representation, hopefully with not too much reification required :)