SPORF
SPORF copied to clipboard
Issue with Predict() when number of unique predicted labels is less than number of possible labels
Describe the bug When calling predict, I obtain the error:
Error in factor(predictions, labels = labels) :
invalid 'labels'; length n should be 1 or k
where n > k
every time (note you could substitute n, k for any integers satisfying this constraint above).
To Reproduce
I believe the error is that if predictions
does not contain any predictions for a single class that exists in the training data, the way that the factoring is done causes an error. Minimal reproducible example demonstrating this issue with the way the predictions are being assigned class labels would be (ie, the flaw with the approach chosen):
x <- rep(letters[1:5], 3) # x has only 5 unique elements
factor(x, labels=LETTERS[1:10]) # note that there are more labels than unique elements of x
Error in factor(x, labels = LETTERS[1:10]) :
invalid 'labels'; length 10 should be 1 or 5
I noticed this bug when I had a training set with extremely sparse representation (30 samples of 10,000) of a single class, which presumably is just never predicted during prediction and hence the error is thrown if I had to guess.
Expected behavior The predictions are returned.
Desktop (please complete the following information):
- OS: Ubuntu 18.04
- Language: R
- Version 2.0.4
Additional context It would appear this issue can be fixed by simply:
x <- rep(letters[1:5], 3) # x has only 5 unique elements
factor(x, levels=LETTERS[1:10]) # note that there are more labels than unique elements of x