Continous data
Hi again!
I was thinking in using bayesian networks for continous data, do you think that an implementation in order to work with these kind of data could be possible?
Thanks!
At the moment bnlearn can only be used for discrete/categorical analysis. Implementing models that can work with continuous values is on my (long) todo list.
Nevertheless, there are possibilities to model your data if you have continuous data. You may can discretize your variables with domain/experts knowledge. For example, if you have temperature, you can mark temperature < 0 as freezing, and
0 as normal. Or many more smaller categories.
Any update as of today regarding to implementing the continuous values?
Not yet but I'm planning to give it a start soonish. If you have any suggestions, working examples or methods to start with, let me know.
Yes. Happy to help with examples. I am currently modeling soil corrosivity in Python. Please let me when you would like to start so I can prepare a easy-to-start example.
Cheers for your work @erdogant . Do you have any kind of ETA on this ? :-)
Yes Soonish! Hopefully shortly after the summer break in august! #wishfullthinking
Please count on me for adding the feature to the model. Thanks
@erdogant Have you seen this continuous approach?
Thank you for spotting this repo! But the repo seems to support also only discrete models.
I am thinking of applying their continuous approach to bnlean!
If you can get it up and running with an example for continuous data, It would be great! But the name of the repo I am seeing is [LearnDiscreteBayesNets.jl], which hints towards discrete models. Can you paste the name of the function(s) of interest to be sure I am looking at the right repo.
@samanemami I think the function you are referring to simply discretizes the data before learning the network.
@fimselamse Yes, that is the idea. Couldn't we use it?
Very interesting though! I am going to read the paper first :-)
@samanemami Yes, this is definitely way to create a network from continous data. However, it is not what's being done in the R bnlearn library. They use a special gaussian case of the scoring function, if I understand it correctly.
Perfect @erdogant, count on me.
@fimselamse, Yes. This is not the method used in R, but this package could add other methods to handle the continuous data as well.
@samanemami Definitely! I have tried some information preserving discretization methods myself. Some results were promising, but not as good as the method used in R. Would love to see the results with this one, though!
I have been looking into this peace of code and it is written in Julia (I was hoping Python #wishfullthinking). Maybe not a huge issue but manually converting is kind of error-prone and time-consuming. I did find documentation about Julia to Python. I will give that a try first.
If someone here is familiar with Julia, it would be very helpful!
@erdogant regarding real case examples of continuous data (e.g., corrosion), please let me know if I can be of help. Interested to test them using bnlearn. When would you expect to start the new version testing?
I added the julia code in the github. I did an attempt to manually rewrite it to Python but without being experienced in Julia, it was quite intensive. Now I am thinking to create a small use case to use ChatGPT to let it rewrite it to Python. However, In my experience, the code that chatGPT generates can be buggy and not always correct. Nevertheless, it may be a good starting point to bugfix and rewrite!
It took a while but in the latest release, there is now an implementation for continuous data modeling! There are many unit tests to ensure the quality and validity of the results but I did not use it in a real-life scenario yet.
Check it out in the docs.
I am closing this one as there is now a basis functionality that can process continous data! See the documentations for more details.