lolo
lolo copied to clipboard
Categorical input support for lolopy
I might be mistaken, but lolopy does not seem to support categorical inputs. Input of categorical features fails in utils.py with an attempted cast of X to np.float64. @WardLT
If there's a set way of providing categoricals to lolopy, it'd be useful to document or provide an example.
Could you provide a stack trace? We do have support for using lolo's random forest for classification with RandomForestClassifer
Just to clarify, I meant using a categorical as one of the input dimensions. For example:
X = [['a', 1.0, 2.0], ['b', 1.5, 2.2], ...]
and
y = [5.5, 6.7, ...]
for rf=RandomForestRegressor()
, where I'm trying rf.fit(X,y)
. Sorry if this was not intended usage.
Oh, I misunderstood your question, sorry!
Correct, lolopy
does not support categorical imports. How does the underlying methods in lolo handle them?
Ok, thanks for clarifying! I don't really know the scala side. There is an encoder written by @maxhutch. Happy to try to (eventually) figure it out and submit a PR to add support to lolopy though.
@WardLT it handles them seamlessly by encoding them into Char
(only up to 256 categories are supported) and then having a special splitter for them.
The trick is going to be sending a Vector[Any]
, where some of those Any
are Double
and some of them are objects. In lolo, they don't even have to be strings:
https://github.com/CitrineInformatics/lolo/blob/develop/src/main/scala/io/citrine/lolo/trees/regression/RegressionTree.scala#L45