smartcore
smartcore copied to clipboard
Prediction of GaussianNB panics in some conditions
I'm submitting a
- [x] bug report.
Current Behaviour:
Prediction of GaussianNB fails when trained on very small data.
Expected Behaviour:
Assertion of Y may fail but should not panic.
Steps to reproduce:
#[test]
fn run_gaussian_naive_bayes_with_few_samples() {
let x = DenseMatrix::<f64>::from_2d_array(&[
&[-1., -1.],
&[-2., -1.],
&[-3., -2.],
&[1., 1.],
]);
let y: Vec<u32> = vec![1, 1, 1, 2];
let gnb = GaussianNB::fit(&x, &y, Default::default()).unwrap();
assert_eq!(gnb.classes(), &[1, 2]);
let y_hat = gnb.predict(&x).unwrap();
assert_eq!(y_hat, y);
}
Snapshot:
thread 'classifier::bayes::tests::test_detector' panicked at 'called `Option::unwrap()` on a `None` value', /Users/chriamue/.cargo/registry/src/github.com-1ecc6299db9ec823/smartcore-0.3.0/src/naive_bayes/mod.rs:109:67
https://github.com/smartcorelib/smartcore/blob/development/src/naive_bayes/mod.rs#L109
.map(|(class_index, class)| {
(
class,
self.distribution.log_likelihood(class_index, &row)
+ self.distribution.prior(class_index).ln(),
)
})
.max_by(|(_, p1), (_, p2)| p1.partial_cmp(p2).unwrap())
p1 or p2 is NaN so partial_cmp is None which panics on unwrap().
Do you want to work on this issue?
A solution would be to check the log_likelihood for NaN and return 0, but I am not sure if this is mathematically right. https://github.com/smartcorelib/smartcore/blob/development/src/naive_bayes/gaussian.rs#L77
fn log_likelihood<'a>(&self, class_index: usize, j: &'a Box<dyn ArrayView1<X> + 'a>) -> f64 {
let mut likelihood = 0f64;
for feature in 0..j.shape() {
let value = X::to_f64(j.get(feature)).unwrap();
let mean = self.theta[class_index][feature];
let variance = self.var[class_index][feature];
likelihood += self.calculate_log_probability(value, mean, variance);
}
if likelihood.is_nan() {
0f64
} else {
likelihood
}
}
Another solution is to check for NaN in predict function: https://github.com/smartcorelib/smartcore/blob/development/src/naive_bayes/mod.rs#L102
.map(|(class_index, class)| {self.distribution.prior(class_index).ln());
let mut log_likelihood = self.distribution.log_likelihood(class_index, &row);
if log_likelihood.is_nan() {
log_likelihood = 0.0
}
(
class,
log_likelihood
+ self.distribution.prior(class_index).ln(),
)
})
thanks for finding this 👍
I think the right approach would be to return an error if log_likehood
is NaN. Let me check.