biomedical
biomedical copied to clipboard
WIP script to process a static json metadata file
@galtay this is the script (WIP) as a reference for parsing metadata from a static file
thanks, this is great as a way of flattening the json metadata file. I've added a "one row per split" set of tabular files in the latest version of this https://github.com/bigscience-workshop/biomedical/blob/master/scripts/gather_dataset_stats.py (and put the results in google drive for the public datasets https://drive.google.com/drive/folders/1sRTVcQO8CcpaagLWRVM8qieIHrfB9niN?usp=sharing) but yes, we'll need to slice and dice this data a lot of ways!