redflag
redflag copied to clipboard
Safety net for machine learning pipelines. Plays nice with sklearn and pandas.
Adopt the simpler approach to dynamic versioning I'm using here https://github.com/scienxlab/python-package-template Consider dropping `__version__` completely, rationale: https://github.com/pypa/packaging.python.org/pull/1276#issuecomment-1646696925
In a regression task, it's good practice to compute **interactions** and **nonlinear transformations**, eg via polynomial basis expansion. It should not be too hard to detect if this has been...
Lasso may be a better indicator of feature importance, as it tries to eliminate features. But the alpha parameter needs to be tuned, eg with https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html
@kwinkunks Thanks a lot for creating a nice package. Here are some more detailed comments that I think could be helpful. 1. `wasserstein` could return a pandas DataFrame with appropriate...
Can estimate precision using beta distribution: https://www.rikvoorhaar.com/validation-size/ Perhaps could also model the uncertainty on the accuracy estimate for the user, but that seems maybe off-topic.
Would be nice to have... Here's one way https://github.com/steinwurf/versjon
Could be another way to measure the similarity between datasets. From the `twinning` repo: https://github.com/avkl/twinning > `energy()` computes the energy distance (Székely & Rizzo, 2013) between a given dataset and...
Can't use (say) +/- 3 standard deviations if feature is non-Gaussian. So apply transformation first, eg with Yeo-Johnson transformation, see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html and also #46
Not filter! Need to find out what's causing them and fix it. Too much noise from warnings: ``` src/redflag/distributions.py::redflag.distributions.best_distribution overflow encountered in divide src/redflag/distributions.py::redflag.distributions.cv_kde src/redflag/distributions.py::redflag.distributions.fit_kde src/redflag/distributions.py::redflag.distributions.get_kde src/redflag/distributions.py::redflag.distributions.is_multimodal src/redflag/distributions.py::redflag.distributions.is_multimodal src/redflag/distributions.py::redflag.distributions.kde_peaks Data...
- Great Expectations - seems big and cumbersome - lesson: stay lean - Evidently - "framework to evaluate, test and monitor ML models in production." - looks nice but quite...