opencv_contrib
opencv_contrib copied to clipboard
DTree train doesn't classify training data correctly
Transferred from http://code.opencv.org/issues/4281
|| Siddharth Krishna on 2015-04-17 00:01
|| Priority: Normal
|| Affected: branch '2.4' (2.4-dev)
|| Category: ml
|| Tracker: Bug
|| Difficulty:
|| PR:
|| Platform: x64 / Linux
DTree train doesn't classify training data correctly
I need to train a decision tree that completely fits my data. I _want_ it to over-fit. Thus, I don't want it to be pruned, and I want it to grow the tree until every leaf has samples with only one label. Mine is a classification task, with two labels. Here are the params I used:
<pre>
CvDTreeParams params;
params.min_sample_count = -1;
params.regression_accuracy = 0;
params.use_surrogates = false;
params.truncate_pruned_tree = false;
params.cv_folds = 0;
params.use_1se_rule = false;
</pre>
And here is how I'm training:
<pre>
cv::Mat trainData(numSamples, dim, CV_32FC1);
cv::Mat trainLabels(numSamples, 1, CV_32SC1);
// ...
CvDTree* dtree = new CvDTree();
cv::Mat var_type(newDim + 1, 1, CV_8U);
// all inputs are numerical
var_type.setTo(cv::Scalar(CV_VAR_NUMERICAL) );
// output is categorical
var_type.at<uchar>(newDim, 0) = CV_VAR_CATEGORICAL;
dtree->train(trainData, CV_ROW_SAMPLE, trainLabels,
cv::Mat(), cv::Mat(), var_type, cv::Mat(), params);
</pre>
From the documentation, this should grow a tree that classifies all training data correctly. But on the input attached as samples.txt (each row is one point, last integer on each row is the label), this returns a tree that misclassifies a training point.
History
Vadim Pisarevsky on 2015-04-27 11:11
- Category set to ml