Dylan Sherry

Results 15 issues of Dylan Sherry

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html Will need to run perf tests. Ideally, we can come up with some key examples where ordinal encoding outperforms one-hot encoding.

new feature
needs design
good first issue
spike

We currently use logistic reg for classification and linear reg for regression. I bet lasso would perform better!

new feature
performance

Overfitting protection @rpeck

enhancement
performance

Good at handling imbalanced data: https://www3.nd.edu/~nchawla/papers/DMKD11.pdf I couldn't find a widely-used python impl. I did find [this](https://github.com/EvgeniDubov/hellinger-distance-criterion#example), looks like there's some cython going on. The same author wrote [this nice...

new feature
needs design
performance
new component

https://stackoverflow.com/a/59076067 If we set a timeout on the overall runtime of the unit tests, that could be helpful in preventing us from adding a bunch of short-running tests which cumulatively...

enhancement
testing

[Cloudwatch link](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Fecs$252Fevalml-test/log-events/ecs$252Fprod-evalml$252F9da7d5e7ccc44ca6a2fdd9c3943d1997) The warning being printed: ``` Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.3056943635107245, tolerance: 0.021498518468096656 ``` Unclear which dataset this...

bug

Currently woodwork will detect data of the form ``` $1.234 $5.678 ... ``` as "Natural Language". It would be helpful if we created a currency type so that this sort...

needs design
new feature
evalml
spike

**Goal** If users intend to compute graphs and stats on any of the models trained during each CV fold, we should design an API which allows them to do so....

new feature
needs design

Many of our components can support passing [sparse matrices](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html), which is critical for large sparse datasets. Note in addition to figuring out which components and pipelines could support this without...

new feature
needs design
performance

In the usability blitz, @christopherbunn showed [what looked like an 8x speedup in wall-clock runtime](https://alteryx.quip.com/gwW9AQg5m0Nq/Evalml-Usability-Blitz-April-2020#PFNACAoq60N) when GPU support was enabled in our catboost component. So yes, GPUs are awesome :)...

new feature
needs design