evalml
evalml copied to clipboard
EvalML is an AutoML library written in python.
In #2667 , we added the ability of AutoMLSearch to create its own Dask LocalCluster to perform work in parallel. An important aspect of using a Dask LocalCluster is turning...
Preview of the error:  We've noticed this for several of our perf testing datasets. One in particular was `regress.csv` Cloudwatch link [here](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Fecs$252Fevalml-test/log-events/ecs$252Fprod-evalml$252F9da7d5e7ccc44ca6a2fdd9c3943d1997) The warning being printed: ``` 2021-09-09T17:33:44.934-04:00 |...
[Cloudwatch link](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Fecs$252Fevalml-test/log-events/ecs$252Fprod-evalml$252F9da7d5e7ccc44ca6a2fdd9c3943d1997) The warning being printed: ``` Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.3056943635107245, tolerance: 0.021498518468096656 ``` Unclear which dataset this...
## Reasoning It is not clear that AutoMLSearch is tuning the pipeline threshold during search. - It's not mentioned in our docs: https://evalml.alteryx.com/en/stable/user_guide/automl.html - It's not mentioned in the logs...
When running nyc_taxi.csv dataset through AutoML search (which has ~1.5M rows and 19 columns), it uses up more than 30 GB of memory when fitting an ensembling pipeline. This excessive...
Related to [this discussion](https://github.com/alteryx/evalml/pull/3373#discussion_r832473688). With [this PR](https://github.com/alteryx/evalml/pull/3373) we have consolidated most of the pipeline parameter-creating logic into the `AutoMLAlgorithm` class. However, there are still cases, like with the sampler parameters,...
Once the critical infrastructure for clustering is built but before all the changes are added to main , we need to make sure we have comprehensive documentation for users to...
In #3419 we added a fix to handle Email and URL features in default algorithm of such features by backtracking and looking at not only the feature provenance `EmailFeaturizer` and...
The `TimeSeriesRegularizer` currently sets the `window_length` to 5 and `threshold` to 0.8. This was done to support smaller datasets that wouldn't pass an `infer_frequency` check from WW with higher `window_length`...
As of sklearn version 0.22, ccp_alpha has been added as a pruning parameter for Decision Trees, Extra Trees, and Random Forests. Adding this as a hyperparameter would give AutoML an...