pyDataverse icon indicating copy to clipboard operation
pyDataverse copied to clipboard

Verify file integrity of downloaded files by hash sum

Open skasberger opened this issue 4 years ago • 2 comments

Verify the file integrity of files downloaded with their hash values. Mentioned in a call by @atrisovic.

Prepare

  • [x] check Python hash implementation
    • [x] md5
    • [x] sha-1
    • [x] sha-256
    • [x] sha-512
  • [x] check what has to be hashed from the response: the resp.content or needs a temporary file be saved before hashing the file? -> requests.Response.content

Implementation

  • [ ] Write tests
    • [ ] add argument to enable/disable this
    • [ ] add argument to pass checksum algorithm to be used: default = MD5, other = SHA-1, SHA-256 or SHA-512
  • [ ] Update code: get_datafile()
import hashlib
from pyDataverse.api import NativeApi
api = NativeApi("https://data.aussda.at)
resp = api.get_datafile(3702)
m = hashlib.md5()
# m = hashlib.sha1()
# m = hashlib.sha256()
# m = hashlib.sha512()
m.update(resp.content)
m.hexdigest()
  • [ ] Update Docs
  • [ ] Update Docstrings
  • [ ] Run pytest
  • [ ] Run tox
  • [ ] Run pylint
  • [ ] Run mypy

Review

Follow-Ups

  • [ ]

skasberger avatar Mar 01 '21 11:03 skasberger

Hey @skasberger!

This is how I solved the problem for checking the checksum error in my previous project: https://github.com/atrisovic/dataverse-r-study/blob/0fc1c223ed0a0777633f94f9b7cad699003aaf7a/docker/download_dataset.py#L32-L39

I tried playing with the client to incorporate the code, but I think it's quite awkward to do it the same way. I can still share the code if you think it would be any helpful, but I think there needs to be another approach x)

atrisovic avatar Mar 26 '21 21:03 atrisovic

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

pdurbin avatar Feb 14 '24 20:02 pdurbin