Dylan Sherry issues

Results 15 issues of


                                            Dylan Sherry

Add Ordinal encoder component

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html Will need to run perf tests. Ideally, we can come up with some key examples where ordinal encoding outperforms one-hot encoding.

new feature

needs design

good first issue

spike

Ability to use more meta-learner models for stacked ensembles

We currently use logistic reg for classification and linear reg for regression. I bet lasso would perform better!

new feature

performance

Select pipelines to pass to ensembling based on correlation of the residual

Overfitting protection @rpeck

enhancement

performance

Estimator: Hellinger distance decision trees (HDDT)

Good at handling imbalanced data: https://www3.nd.edu/~nchawla/papers/DMKD11.pdf I couldn't find a widely-used python impl. I did find [this](https://github.com/EvgeniDubov/hellinger-distance-criterion#example), looks like there's some cython going on. The same author wrote [this nice...

new feature

needs design

performance

new component

Set overall timeout on unit test jobs

https://stackoverflow.com/a/59076067 If we set a timeout on the overall runtime of the unit tests, that could be helpful in preventing us from adding a bunch of short-running tests which cumulatively...

enhancement

testing

Warning in automl: "Objective did not converge"

[Cloudwatch link](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Fecs$252Fevalml-test/log-events/ecs$252Fprod-evalml$252F9da7d5e7ccc44ca6a2fdd9c3943d1997) The warning being printed: ``` Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.3056943635107245, tolerance: 0.021498518468096656 ``` Unclear which dataset this...

bug

Add a logical type Currency

Currently woodwork will detect data of the form ``` $1.234 $5.678 ... ``` as "Natural Language". It would be helpful if we created a currency type so that this sort...

needs design

new feature

evalml

spike

Model debugging: Add ability to compute and store graphs/stats on each CV fold during automl search

**Goal** If users intend to compute graphs and stats on any of the models trained during each CV fold, we should design an API which allows them to do so....

new feature

needs design

Sparse matrix support

Many of our components can support passing [sparse matrices](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html), which is critical for large sparse datasets. Note in addition to figuring out which components and pipelines could support this without...

new feature

needs design

performance

Add support for GPU acceleration

In the usability blitz, @christopherbunn showed [what looked like an 8x speedup in wall-clock runtime](https://alteryx.quip.com/gwW9AQg5m0Nq/Evalml-Usability-Blitz-April-2020#PFNACAoq60N) when GPU support was enabled in our catboost component. So yes, GPUs are awesome :)...

new feature

needs design