Dylan Sherry
Dylan Sherry
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html Will need to run perf tests. Ideally, we can come up with some key examples where ordinal encoding outperforms one-hot encoding.
We currently use logistic reg for classification and linear reg for regression. I bet lasso would perform better!
Overfitting protection @rpeck
Good at handling imbalanced data: https://www3.nd.edu/~nchawla/papers/DMKD11.pdf I couldn't find a widely-used python impl. I did find [this](https://github.com/EvgeniDubov/hellinger-distance-criterion#example), looks like there's some cython going on. The same author wrote [this nice...
https://stackoverflow.com/a/59076067 If we set a timeout on the overall runtime of the unit tests, that could be helpful in preventing us from adding a bunch of short-running tests which cumulatively...
[Cloudwatch link](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Fecs$252Fevalml-test/log-events/ecs$252Fprod-evalml$252F9da7d5e7ccc44ca6a2fdd9c3943d1997) The warning being printed: ``` Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.3056943635107245, tolerance: 0.021498518468096656 ``` Unclear which dataset this...
Currently woodwork will detect data of the form ``` $1.234 $5.678 ... ``` as "Natural Language". It would be helpful if we created a currency type so that this sort...
Model debugging: Add ability to compute and store graphs/stats on each CV fold during automl search
**Goal** If users intend to compute graphs and stats on any of the models trained during each CV fold, we should design an API which allows them to do so....
Many of our components can support passing [sparse matrices](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html), which is critical for large sparse datasets. Note in addition to figuring out which components and pipelines could support this without...
In the usability blitz, @christopherbunn showed [what looked like an 8x speedup in wall-clock runtime](https://alteryx.quip.com/gwW9AQg5m0Nq/Evalml-Usability-Blitz-April-2020#PFNACAoq60N) when GPU support was enabled in our catboost component. So yes, GPUs are awesome :)...