Ben Sadeghi

Results 40 comments of Ben Sadeghi

@Eight1911 It's worth a try to see how the above approach affects how the pruning exercise "feels" like. You can test it out using the [Iris pruning runs](https://github.com/bensadeghi/DecisionTree.jl/blob/9a6d9e53e6a82a307d36ef2feff4d52db93b997c/test/classification/iris.jl#L26). The current...

@Eight1911 I don't think we need another pre-pruning criteria. Note that scikit-learn is [deprecating](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) `min_impurity_split` in favor of `min_impurity_decrease`. I'm, personally, quite comfortable with the current implentation of `prune_tree` for...

Yeah, ```build_adaboost_stumps``` has always had issues, and it does use a different optimization technique than ```build_tree```, which is quite slow. Not sure what to do here; it requires significant work....

Yeah, this is an issue. Back to your example, note that the tree generated is actually a leaf, and so there is no decision to be made based on input...

I'm still hesitant to add a new field to the ```Node``` type. If this issue is handled in SKL.jl, then it's ok. And yes, the bloated models need to be...

The split routines already identify which features have the most predictive power (information gain) via Shannon entropy. So IMO, manually identifying/defining which features are of high importance is unnecessary, and...

Works fine on the DT.jl side. The issue might be with MLJ.jl, potentially need to overload isless(). ```julia using Random, DecisionTree features, labels = load_data("adult") # Note that the data...

@ablaom Yes, lexicographical order is used for the splitting criteria, where subsets of the features are sorted before being searched through for the best split (via information gain). I'm not...

Thanks @ablaom. I've updated the readme with your input.

You could cast the features to a concrete type (ie `X = Int.(X)`) as opposed to using the `Any` type, which is quite heavy. That should help a little bit....