algorithmic-efficiency icon indicating copy to clipboard operation
algorithmic-efficiency copied to clipboard

Publish md5 hashes of datasets

Open adefazio opened this issue 1 year ago • 3 comments

Description

Is it possible to publish file hashes and directory layouts for all datasets, post processing. I would like to run some checks to ensure that there are no discrepancies with the data my team has downloaded and processed.

adefazio avatar Feb 20 '24 19:02 adefazio

The dataset layouts and final sizes are documented in datasets/README.md in the dropdown items saying "The final directory structure should look like this:". image

priyakasimbeg avatar Feb 20 '24 20:02 priyakasimbeg

Thanks, that's useful. Would it be possible to publish hashes of the files as well?

adefazio avatar Feb 20 '24 23:02 adefazio

@chandramouli-sastry could you help close this request? I have all the data from the setup scripts downloaded in kasimbeg-8 in /home/kasimbeg/data. The remaining work is to:

  1. Check the README for data setup to make sure the file structure matches and there are no additional files left from the download.
  2. Get the hashes for all of the datasets and add them to the data setup README.

priyakasimbeg avatar Feb 28 '24 03:02 priyakasimbeg