PyDataset icon indicating copy to clipboard operation
PyDataset copied to clipboard

Regression/Classification info

Open ogencoglu opened this issue 7 years ago • 4 comments

Hi,

It would be nice to have a 3rd column for data() output indicating whether the dataset can be used for regression or classification problems.

ogencoglu avatar May 19 '17 11:05 ogencoglu

Hi @ogencoglu sounds like a cool idea, thanks. Any thought on how to approach clustering them ?

iamaziz avatar May 19 '17 16:05 iamaziz

I think it is just manual work. Not all datasets may be suitable for this but many machine learning people search for datasets to try their algorithms/implementations in a smaller scale before going to well-known benchmark datasets.

ogencoglu avatar May 21 '17 18:05 ogencoglu

agreed it'd be nice to filter for regression or classification, but dont see how you could properly categorize datasets.

a regression dataset could be a classification dataset, and vice versa, depending on your preprocessing strategy (eg binning) and target feature.

for example, the canonical iris dataset, used for classification, could be viewed as regression too.

mynameisvinn avatar Jul 02 '17 13:07 mynameisvinn

My idea was something similar to UCI data repo: http://archive.ics.uci.edu/ml/datasets.html

The column can be "Default Task". Some datasets may have even both Classification and Regression.

ogencoglu avatar Jul 03 '17 09:07 ogencoglu