dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

md5 hash displayed to user is wrong

Open charmoniumQ opened this issue 2 years ago • 13 comments

What steps does it take to reproduce the issue?

See the dataset file page here.

  • This page says the "Original File MD5" begins with 9e9be.... But this is not true.
  • The "Stata Binary (Original File Format)" file has an md5 hash beginning with 20ddc4....
  • The "Tab-Delimited" file has an md5 hash beginning with 1f75c2....
  • However "Tab-Delimited" file without the header row (cat file.tab | tail --lines=+2 | md5sum) has an md5 hash of 9e9be....

This is a bug because it will lead users to believe that they downloaded a corrupted file.

There are two parts: the incorrect labeling, and cutting off the header row. The label should be "Tab-Delimited File MD5" not "Original File MD5." Cutting off the header row is more interesting. Why does Dataverse send the file to the user, but hash a transformed version of that file?

  • When does this issue occur?

Unknown

  • Which page(s) does it occurs on?

The Stata files of this dataset that I checked by hand.

Which version of Dataverse are you using?

The one hosted at https://dataverse.harvard.edu/, 5.13 build 1244-79d6e57

charmoniumQ avatar Apr 06 '23 19:04 charmoniumQ