evalml
evalml copied to clipboard
EvalML is an AutoML library written in python.
### Pull Request Description Added a datacheck that checks if the amount of unknowns per dataset isn't more than 50%. Closes #2478 ----- *After creating the pull request: in order...
- pyproject.toml is the future of python package metadata and tool config - "One file to rule them all" - Examples - https://github.com/alteryx/featuretools/issues/2261 - https://github.com/alteryx/woodwork/pull/1506
Extension of issue [470](https://github.com/alteryx/evalml/issues/470). PR [1454](https://github.com/alteryx/evalml/pull/1454) addresses adding the FeatureTools component, but only handles single dataframes/datatables. In order to use FeatureTools fully, we want to be able to use it...
Model debugging: Add ability to compute and store graphs/stats on each CV fold during automl search
**Goal** If users intend to compute graphs and stats on any of the models trained during each CV fold, we should design an API which allows them to do so....
Many of our components can support passing [sparse matrices](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html), which is critical for large sparse datasets. Note in addition to figuring out which components and pipelines could support this without...
In the usability blitz, @christopherbunn showed [what looked like an 8x speedup in wall-clock runtime](https://alteryx.quip.com/gwW9AQg5m0Nq/Evalml-Usability-Blitz-April-2020#PFNACAoq60N) when GPU support was enabled in our catboost component. So yes, GPUs are awesome :)...
This came up [in the usability blitz](https://alteryx.quip.com/gwW9AQg5m0Nq/Evalml-Usability-Blitz-April-2020#PFNACAJQ4nk). If an initial model has a poor score, the plot scale can make it totally unreadable.
RandomSearch/GridSearch tuners: if search space exhausted for one pipeline type, entire search stops
**Problem** #230 added `RandomSearch` and `GridSearch` tuners. Unlike the `SKOptTuner`, those tuners have potentially finite search spaces, and can eventually run out of parameters to suggest (particularly `GridSearch`). If automl...
Currently, we only have SelectFromModel. It would be nice to support some feature selectors (ex: SelectKBest, SelectPercentile) that don't rely on an estimator and instead simply select features using statistical...