scikit-lego icon indicating copy to clipboard operation
scikit-lego copied to clipboard

Remove Deprecated Code

Open koaning opened this issue 2 years ago • 5 comments

Over the years some deprecated code was introduced, for the 0.7.0 release underway it should finally be removed.

Here are some of the examples that I found

koaning avatar Sep 25 '22 13:09 koaning

Technically, I think after this change the deprecated module isn't needed anymore. Any objections is this dependency is also removed @MBrouns or are there new deprecations coming in?

koaning avatar Sep 25 '22 13:09 koaning

You never know, but overall I think we managed to keep deprecations low. Might as well remove it as a dependency in the meantime

MBrouns avatar Sep 25 '22 14:09 MBrouns

Wanted to suggest another deprecated part of the codebase to cleanup. sklearn.datasets.load_boston is used in some of the test files. It is deprecated and will be removed in scikit-learn==1.2.

Explanation given by scikit-learn maintainers:

The Boston housing prices dataset has an ethical problem. You can refer to
the documentation of this function for further details.
  
The scikit-learn maintainers therefore strongly discourage the use of this
dataset unless the purpose of the code is to study and educate about
ethical issues in data science and machine learning.
  
In this special case, you can fetch the dataset from the original
source::
  
      import pandas as pd
      import numpy as np
  
      data_url = "http://lib.stat.cmu.edu/datasets/boston"
      raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
      data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
      target = raw_df.values[1::2, 2]
  
Alternative datasets include the California housing dataset (i.e.
:func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
dataset. You can load the datasets as follows::
  
      from sklearn.datasets import fetch_california_housing
      housing = fetch_california_housing()
  
for the California housing dataset and::
  
      from sklearn.datasets import fetch_openml
      housing = fetch_openml(name="house_prices", as_frame=True)
  
for the Ames housing dataset.

CarloLepelaars avatar Oct 11 '22 11:10 CarloLepelaars

I'm aware of the removal, I was somewhat involved.

Maybe we can add the dataset to our datasets module, with a docstring that gives appropriate context?

koaning avatar Oct 11 '22 12:10 koaning

This might also be of interest: https://fairlearn.org/main/user_guide/datasets/boston_housing_data.html

koaning avatar Oct 11 '22 12:10 koaning

Fixed in #626

FBruzzesi avatar May 12 '24 11:05 FBruzzesi