pgmpy_notebook icon indicating copy to clipboard operation
pgmpy_notebook copied to clipboard

KeyError in notebook "1. Intro..."

Open ansgar-t opened this issue 3 years ago • 4 comments

Depending on the random train/test split this code can give a key error:

for i in X_test_features:
    predicted_values.append(joint_prob[i[0], i[1]].idxmax())

Here's a possible alternative:

for i in X_test_features:
    key = (i[0], i[1])
    conditional_prob = joint_prob[key].idxmax() if key in joint_prob else 0.0
    predicted_values.append(conditional_prob)

ansgar-t avatar May 28 '22 17:05 ansgar-t

@ansgar-t Yes, this seems to be an issue. Thanks for reporting it. But I think it would be better to assign None/Nan for keys that don't exist. As 0 is an actual class value which would lead to mis-classification. Something like: joint_prob[key].idxmax() if key in joint_prob else None. What do you think? and would you like to create a PR for it?

ankurankan avatar Jun 03 '22 07:06 ankurankan

you're right about the else case, of course. "0.0" doesn't make sense there.

looks like I thought of separating 2 steps:

  • Looking up estimated conditional probabilities
  • Maximizing them.

... and then I didn't. :)

ansgar-t avatar Jun 06 '22 06:06 ansgar-t

having said that...

I think adding the following code to the preparation of joint_prob would be my preferred solution now:

# making sure, that the estimated joint probability is defined over the full domain, 
# using 0.0 for value combinations not seen in the data:

length_domain = range(16) # assuming length cannot exceed 15
width_domain = range(11) # assuming width cannot exceed 10
type_domain = range(3) 

full_index = pd.MultiIndex.from_product([length_domain, width_domain, type_domain])

joint_prob = joint_prob.reindex(full_index).fillna(0.0)

ansgar-t avatar Jun 06 '22 10:06 ansgar-t

@ansgar-t Sorry for the super late reply. Yes, this solution also looks good. Would be great if you would open a PR with the fix :). Thanks

ankurankan avatar Jun 27 '22 13:06 ankurankan