pgmpy icon indicating copy to clipboard operation
pgmpy copied to clipboard

Fix joint distribution error during Exact Inference

Open MadDataScience opened this issue 8 years ago • 8 comments

This may be the same bug identified by @Erotemic in issue #534

I noticed that, when given multiple variables, _variable_elimination computes them each independently, marginalizing over the others. This is a problem when the variables are not independent. I added a flag, marginal_independent, which, if set to False, will return the joint factor for all variables queried. It defaults to True in order to maintain backwards compatibility.

The default (old) behavior returns a dictionary where the keys are the variables and the values are the factors. Since the new version only returns a single joint probability distribution, it seems unnecessary to return it nested in a dictionary. As a result, the new version simply returns the factor directly. I realize this is a no-no as it means the method returns two different data types based on which flag is set. But it carries the benefit that the query can now be nested directly in anything that expects a Factor. I hope that this improved ease of use justifies the breaking of convention, and that we can eventually drop the old way entirely.

MadDataScience avatar Apr 12 '16 18:04 MadDataScience

Coverage Status

Coverage decreased (-0.01%) to 95.783% when pulling 91a697457e2bd50061543741b7e514934ec345c1 on MadDataScience:hotfix/variable_elimination_dependence into d565bafcadcb4cea0ce700379f79deb8cf756997 on pgmpy:dev.

coveralls avatar Apr 12 '16 18:04 coveralls

@MadDataScience

I noticed that, when given multiple variables, _variable_elimination computes them each independently, marginalizing over the others. This is a problem when the variables are not independent.

Yeah, this is indeed a problem.

Returning two different data types can be really a mess, and user have to be really careful for this. I'll not agree on returning two different data types.

But it carries the benefit that the query can now be nested directly in anything that expects a Factor. I hope that this improved ease of use justifies the breaking of convention, and that we can eventually drop the old way entirely.

I don't completely agree at here with the advantage. Well we can have an additional parameter for this, I believe that will provide user a more flexibility. Right now if user doesn't want all the marginal factors but a complete factor , he needs to iterate over and than compute what we have already computed, so it is a waste of calculation and time. But if user wants to have marginal factors than current implementation is better. @ankurankan What are your views?

khalibartan avatar Apr 13 '16 08:04 khalibartan

@MadDataScience @khalibartan Yes, you guys are right. I think we should by default just return the joint factor on all the variables in the query list. And I don't think we need the extra parameter. Also @MadDataScience you will need to make similar change in BeliefPropagation.

ankurankan avatar Apr 18 '16 01:04 ankurankan

Okay, I think I did what you asked. I tested it with a BeliefPropagation query with two variables and it returned as expected. I have not tested MAP queries. I left it with the default behavior as before because I didn't want to start mucking around with nosetests and whatever else would blow up if suddenly the default behavior changed.

MadDataScience avatar Apr 19 '16 00:04 MadDataScience

Coverage Status

Coverage decreased (-0.02%) to 95.771% when pulling e8ded2dfc1715b2ee37bc72d2a47ecd091056a61 on MadDataScience:hotfix/variable_elimination_dependence into d565bafcadcb4cea0ce700379f79deb8cf756997 on pgmpy:dev.

coveralls avatar Apr 19 '16 00:04 coveralls

It seems like a dev decision is required for this PR to move forward. The implementation here fixes the issue, but I don't like that tries to preserve the existing behavior. In my opinion, when you query with multiple variables, it should always return the joint distribution of those variables not their marginals.

What do you guys want to do? And what else needs to be done?

chebee7i avatar Jun 17 '16 14:06 chebee7i

Any thoughts on this? I would like to help, but I don't know codebase and ramifications well enough to know what should be done.

chebee7i avatar Jul 10 '16 03:07 chebee7i

@chebee7i I think this depends on #488. If you would like to work on #488 please go ahead. I think you will need to fix tests mostly.

ankurankan avatar Jul 11 '16 06:07 ankurankan