EvoTrees.jl
EvoTrees.jl copied to clipboard
Categorical and mixed features types
Support non one-hot encoded categorical features: features carrying item info as an Int (1 to N levels). Consider change from Matrix to DataFrames input structure to handle mixed feature. Consider supporting mix of input structures: DataFrames + SparseMatrix for efficieent handling of mixture of dense (continuous and categorical) and sparse features.
Support non one-hot encoded categorical features: features carrying item info as an Int (1 to N levels).
Maybe better to support categorical arrays to deal with classes that disappear on sub-sampling. Or require total number of classes to be specified as metadata.
You should also make a clear distinction between ordered factors and arbitrary categoricals, which are handled differently in tree-based algorithms. Support for the former is easy - you probably already have the potential to do this by just requiring users to encode with "integer floats", yes? By the way, DecisionTree, although not supporting arbitrary categoricals does support any type implementing order < , and I assume you could do the same?
Consider change from Matrix to DataFrames input structure to handle mixed feature.
Perhaps you may as well support arbitrary tabular formats supporting Tables.jl interface?
Implemented in v0.15