Jano Roelandt
Jano Roelandt
Resolves #171 This PR refactors the existing `compute_univariate_preselection` method according to the DIY principle. It also prepares for sending a custom scoring metric.
This method has a responsibility for classification and for regression. This can be simplified, since the regression part contains a lot of code from the classification part.
Refactor code according to PEP8. This makes PR #132 obsolete. We only use black instead of the other proposed linters and typechecks in that PR.
This method currently plots and compares the predictions versus the actuals. No post-processing of the data is done, which makes the plot not visually attractive, aspecially for big test sets....
Since not all of the features will need the same amount of bins, we should support for example a dictionary in which the n_bins for every feature is mentioned. Not...
When the thresholds are so low that none of the features are selected, it raises an error : `min() arg cannot be empty`, which makes sense, but is not easy...
compute_univariate_preselection automatically choses the metric according to the type of model: Classification: AUC Regression: RMSE We could add the possibility to let end users give their own scoring function as...
Task: Should we have a possibility to process PySpark DataFrames? Currently at Telenet there is a use case in which they use PySpark DataFrames and they would like to use...