AIF360
AIF360 copied to clipboard
Refactor openml_datasets.py
https://github.com/Trusted-AI/AIF360/blob/master/aif360/sklearn/datasets/openml_datasets.py
Add a wrapper around this module so that datasets can be directly accessed using the wrapper
Instead of doing something like this,
from sklearn.datasets import fetch_openml
from aif360.sklearn.datasets.utils import standardize_dataset
# cache location
DATA_HOME_DEFAULT = os.path.join(os.path.dirname(os.path.abspath(__file__)),
'..', 'data', 'raw')
def fetch_adult(subset='all', *, data_home=None, cache=True, binary_race=True,
usecols=None, dropcols=None, numeric_only=False, dropna=True):
if subset not in {'train', 'test', 'all'}:
raise ValueError("subset must be either 'train', 'test', or 'all'; "
"cannot be {}".format(subset))
df = fetch_openml(data_id=1590, data_home=data_home or DATA_HOME_DEFAULT,
cache=cache, as_frame=True).frame
the proposal is to have an OpenMLStore
class OpenMLStore(ABC): @abc.abstractmethod def init(self, **kwargs): pass
def download(self, data_id, data_home):
df = fetch_openml(data_id=1590, data_home=data_home or DATA_HOME_DEFAULT,
cache=cache, as_frame=True).frame
// decide on returning a DF or just the o/p directory location
def upload(self, **kwargs):
pass
And fetch_adult() function can be updated to use OpenMLStore abstraction
Can you elaborate on the shortcomings of the current method?
Hello, I'd like to work on this issue.
I am making progress on this issue, and I would like to continue on this.
Hello I would like to work on this issue
Hello I would like to work on this issue
Hey @vandanapathare. I have already raised the PR and finishing up on my code review.