Nitin Madnani issues

Results 67 issues of


                                            Nitin Madnani

Include feature scaling for sparse features

We can use [MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler) for sparse data. Right now, we convert things to dense if we want to do scaling.

enhancement

help wanted

Add BaggingClassifer and BaggingRegressor

Should be fairly easy since it's pretty much like AdaBoost in terms of intergration.

enhancement

learners

Consolidate `requirements.txt` and `conda_requirements.txt`

It would be nice to have a single requirements file and it is possible to do this since we have done something like this for [RSMTool](https://github.com/EducationalTestingService/rsmtool).

Include fit times in learning curve output

Scikit-learn's learning curve method in 0.22 will include an option to return the times it took to fit each point in the learning curve which can provide another useful parameter...

enhancement

sklearn-compatibility

help wanted

Update `roc_auc` metric to support multi-class

Looks like the `roc_auc` metric will support multi-class classification in sklearn 0.22.

enhancement

sklearn-compatibility

help wanted

metrics

Make stacking available in SKLL

One way to to this is via https://github.com/civisanalytics/civisml-extensions I am not sure what the best approach for incorporation is: (a) include their code as a dependency and use it as...

enhancement

Add HistGradientBoostingClassifier and HistGradientBoostingRegressor

There is a [new experimental implementation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html#sklearn.ensemble.HistGradientBoostingClassifier) for GBTs which is supposed to be orders of magnitude faster than the vanilla GBTs for N > O(10K) which is not an uncommon...

enhancement

good first issue

learners

Add feature transformation and truncation to SKLL

It would be nice to have some more feature pre-processing functionality in SKLL, e.g., feature truncation and transformation sort of like what's in RSMTool. We could have a separate `preprocessing`...

enhancement

Investigate a hyperparameter optimization framework like Optuna

Investigate whether using something like [optuna](https://optuna.org) is better than plain ol' grid search when it comes to the non-deep-learning algorithms that are in SKLL/scikit-learn.

enhancement

Consider making solver and penalty grid-searchable for LogisticRegression

[`LogisticRegresssion`](https://scikit-learn.org/dev/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) now supports many solvers: Perhaps we should consider making the `solver` grid-searchable (and perhaps also the `penalty` option)?

enhancement

sklearn-compatibility

learners