pmlb
pmlb copied to clipboard
fetch metafeatures option in fetch_data
It would be nice to add an option to fetch feature types from fetch_data.
I believe that's captured in #3.
not as a criteria though... an actual list of the types of each feature.
Oh, I see. How would we accomplish that? Scrape from the README?
Any progress on this? It shouldn't be much work using the metadata file of each dataset. I can create a draft pull request, something like:
dataset, metadata = fetch_data('adult', return_medadata=True)
However, I'm not sure what information should be included in the metadata... I can think of three possible options:
- the whole metadata.yaml parsed into a dictionary
- a dictionary feature -> feature_type (e.g., {"age": "continuous", "education_type": "categorical", ....})
- a list of the feature types (e.g., ["continuous", "categorical", ....])
Thanks for this note @rrunix. 🙏🏽 🙌🏽 @JDRomano2 would be the contact at this point, but if I may chime in: yes, a PR would be most welcome. My suggestion would be that the argument return_medadata
could take 'all' (metadata.yaml parsed into a dictionary), 'features' (dictionary of features), or NA (no metadata).