Nitin Madnani
Nitin Madnani
We can use [MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler) for sparse data. Right now, we convert things to dense if we want to do scaling.
Should be fairly easy since it's pretty much like AdaBoost in terms of intergration.
It would be nice to have a single requirements file and it is possible to do this since we have done something like this for [RSMTool](https://github.com/EducationalTestingService/rsmtool).
Scikit-learn's learning curve method in 0.22 will include an option to return the times it took to fit each point in the learning curve which can provide another useful parameter...
Looks like the `roc_auc` metric will support multi-class classification in sklearn 0.22.
One way to to this is via https://github.com/civisanalytics/civisml-extensions I am not sure what the best approach for incorporation is: (a) include their code as a dependency and use it as...
There is a [new experimental implementation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html#sklearn.ensemble.HistGradientBoostingClassifier) for GBTs which is supposed to be orders of magnitude faster than the vanilla GBTs for N > O(10K) which is not an uncommon...
It would be nice to have some more feature pre-processing functionality in SKLL, e.g., feature truncation and transformation sort of like what's in RSMTool. We could have a separate `preprocessing`...
Investigate whether using something like [optuna](https://optuna.org) is better than plain ol' grid search when it comes to the non-deep-learning algorithms that are in SKLL/scikit-learn.
[`LogisticRegresssion`](https://scikit-learn.org/dev/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) now supports many solvers: Perhaps we should consider making the `solver` grid-searchable (and perhaps also the `penalty` option)?