biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

WIP script to process a static json metadata file

Open ruisi-su opened this issue 2 years ago • 1 comments

@galtay this is the script (WIP) as a reference for parsing metadata from a static file

ruisi-su avatar May 29 '22 01:05 ruisi-su

thanks, this is great as a way of flattening the json metadata file. I've added a "one row per split" set of tabular files in the latest version of this https://github.com/bigscience-workshop/biomedical/blob/master/scripts/gather_dataset_stats.py (and put the results in google drive for the public datasets https://drive.google.com/drive/folders/1sRTVcQO8CcpaagLWRVM8qieIHrfB9niN?usp=sharing) but yes, we'll need to slice and dice this data a lot of ways!

galtay avatar May 29 '22 15:05 galtay