lolo
lolo copied to clipboard
A random forest
`LinearRegressionLearner` solves for linear coefficients a pseudoinverse, which is numerically unstable. It should be trivial to replace this with a LAPACK `dgels` or `dgelsd`.
It's helpful for various reasons to have access to the individual predictions made by each tree in the ensemble, in addition to the usual average over ensemble, uncertainties, etc. that...
Bagger uses a Poisson bootstrap. This converges in probability to the ordinary multinomial bootstrap in the large data limit, but we should confirm it's a suitable approximation for our small...
See MultiBaggerTest for an example of how multi-task learning is not as thoroughly-exercised as single-task counterparts.
If the training labels have repeats of label values, then it is increasingly possible that every tree in the ensemble makes the same prediction (even if the input values are...
I might be mistaken, but lolopy does not seem to support categorical inputs. Input of categorical features fails in utils.py with an attempted cast of X to np.float64. @WardLT If...
When calling `UncertaintyCorrelation` with predictive distributions that have constant uncertainty, value `varSigma` ([line 131 in `Merit.scala`](https://github.com/CitrineInformatics/lolo/blob/45bc1cc0d64d8c6a726005fb8e660ee7ebd1b582/src/main/scala/io/citrine/lolo/validation/Merit.scala#L131)) is zero, leading to denominator being zero. As the numerator is also zero, not-a-number...
The error message for lolopy when java isn't installed is: `ValueError: invalid literal for int() with base 10: b''` We should make a better error message for this issue, and...
There are cases where I want to train a bagged model in serial. A constructor argument for the bagger class that turns off parallelism would be nice.
The maximum value of the Gini impurity is `(n-1)/n`, where `n` is the number of classes. This could cause multitask models to be biased towards modeling multi-class labels more accurately...