SDGym icon indicating copy to clipboard operation
SDGym copied to clipboard

Add ability to load and inspect individual datasets

Open npatki opened this issue 2 years ago • 1 comments

Problem Description

The SDGym library currently allows you to list the available datasets for benchmarking purposes. However, it does not offer any abilities to inspect these datasets -- users may want to do this in order to see what the columns, data types, or values look like before they apply them to the benchmarking run.

Expected behavior

Add a download_demo method that is similar to the one in the SDV library. This method would return the data and metadata so that SDGym users can inspect the dataset.

Workaround

The SDV library is a prerequisite of SDGym. So as a workaround, you can access the demo datasets through it.

import sdv

from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='single_table',
    dataset_name='adult'
)

npatki avatar Nov 13 '23 15:11 npatki

For a related discussion, see #253

npatki avatar Nov 13 '23 15:11 npatki