scikit-lego
scikit-lego copied to clipboard
Remove Deprecated Code
Over the years some deprecated code was introduced, for the 0.7.0 release underway it should finally be removed.
Here are some of the examples that I found
- [ ] sklego.meta.outlier_remover
- [ ] sklego.linear_model this is about the FairClassifier
- [x] sklego.meta.grouped_estimator
Technically, I think after this change the deprecated
module isn't needed anymore. Any objections is this dependency is also removed @MBrouns or are there new deprecations coming in?
You never know, but overall I think we managed to keep deprecations low. Might as well remove it as a dependency in the meantime
Wanted to suggest another deprecated part of the codebase to cleanup. sklearn.datasets.load_boston
is used in some of the test files. It is deprecated and will be removed in scikit-learn==1.2
.
Explanation given by scikit-learn maintainers:
The Boston housing prices dataset has an ethical problem. You can refer to
the documentation of this function for further details.
The scikit-learn maintainers therefore strongly discourage the use of this
dataset unless the purpose of the code is to study and educate about
ethical issues in data science and machine learning.
In this special case, you can fetch the dataset from the original
source::
import pandas as pd
import numpy as np
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]
Alternative datasets include the California housing dataset (i.e.
:func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
dataset. You can load the datasets as follows::
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
for the California housing dataset and::
from sklearn.datasets import fetch_openml
housing = fetch_openml(name="house_prices", as_frame=True)
for the Ames housing dataset.
I'm aware of the removal, I was somewhat involved.
Maybe we can add the dataset to our datasets module, with a docstring that gives appropriate context?
This might also be of interest: https://fairlearn.org/main/user_guide/datasets/boston_housing_data.html
Fixed in #626