relbench icon indicating copy to clipboard operation
relbench copied to clipboard

Unable to obtain rel-stackex because of hash mismatch

Open rohitnayak opened this issue 1 year ago • 2 comments

I am getting this error while trying to download the rel-stackex dataset. I am following the examples in the repo readme. rel-amazondoes get downloaded fine

>>> dataset = get_dataset(name="rel-stackex")
Downloading file 'rel-stackex/db.zip' from 'https://relbench.stanford.edu/staging_data/rel-stackex/db.zip' to '/root/.cache/relbench'.
100%|███████████████████████████████████████| 882M/882M [00:00<00:00, 3.58TB/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/relbench/datasets/__init__.py", line 18, in get_dataset
    return dataset_cls_dict[name](*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/relbench/datasets/stackex.py", line 26, in __init__
    super().__init__(process=process)
  File "/usr/local/lib/python3.10/dist-packages/relbench/data/dataset.py", line 67, in __init__
    db_path = _pooch.fetch(
  File "/usr/local/lib/python3.10/dist-packages/pooch/core.py", line 589, in fetch
    stream_download(
  File "/usr/local/lib/python3.10/dist-packages/pooch/core.py", line 808, in stream_download
    hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
  File "/usr/local/lib/python3.10/dist-packages/pooch/hashes.py", line 176, in hash_matches
    raise ValueError(
ValueError: SHA256 hash of downloaded file (db.zip) does not match the known hash: expected dfb84faa4918c6c4ecac791a69a30a477a7bee097d7295d48c78ceb8f59c997c but got deb00ccdf825e569b34935834444429cd1c0074b50226b12d616aab22d36242d. Deleted download for safety. The downloaded file may have been corrupted or the known hash may be outdated.

rohitnayak avatar Mar 30 '24 11:03 rohitnayak

FYI: I was able to proceed by downloading it directly into the local cache directory.

Not sure what is causing it to fail using get_dataset(), since I didn't have to update the hardcoded hashes. I am on ubuntu:latest docker image running on a Mac, with python version Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux.

rohitnayak avatar Mar 30 '24 22:03 rohitnayak

Need to modify the __init__.py file in /PATH-TO-miniconda3/lib/python3.8/site-packages/relbench/__init__.py and change the corresponding hash in pooth.create and re-import the pkg

_pooch = pooch.create(
    path=pooch.os_cache("relbench"),
    base_url="https://relbench.stanford.edu/staging_data/",  # TODO: change
    registry={
        # extremely small dataset only used for testing download functionality
        "rel-amazon-fashion_5_core/db.zip": "27e08bc808438e8619560c54d0a4a7a11e965b90b8c70ef3a0928b44a46ad028",
        "rel-amazon-fashion_5_core/tasks/rel-amazon-churn.zip": "d98f2240aefa0f175dab2fce4a48a1cc595be584d4960cd9eb750d012326117d",
        "rel-amazon-fashion_5_core/tasks/rel-amazon-ltv.zip": "bd2b7b798efad2838a3701def8386dba816b45ef277a8e831052b79f5448aed8",
        "rel-stackex/db.zip": "dfb84faa4918c6c4ecac791a69a30a477a7bee097d7295d48c78ceb8f59c997c",
        "rel-stackex/tasks/rel-stackex-engage.zip": "9afce696507cf2f1a2655350a3d944fd411b007c05a389995fe7313084008d18",
        "rel-stackex/tasks/rel-stackex-votes.zip": "0dab5bebd76a95d689c8a3a62026c1c294a252c561fd940e8d9329d165d98a5a",
        "rel-amazon-books_5_core/db.zip": "2f6bd920bcfe08cbb7d47115f47f8d798a2ec1a034b6c2f3d8d9906e967454b4",
        "rel-amazon-books_5_core/tasks/rel-amazon-churn.zip": "d3890621b1576a9d5b6bc273cdd2ea2084aeaf9c8055c1421ded84be0c48dacb",
        "rel-amazon-books_5_core/tasks/rel-amazon-ltv.zip": "2e91be0ca5d9f591d8e33a40f70b97db346090a8bb9f3a94f49b147f0dc136be",
    },
)

xcvil avatar May 01 '24 22:05 xcvil

Should be fixed in v1.0.0, which we are working towards officially releasing soon.

rishabh-ranjan avatar Jul 22 '24 21:07 rishabh-ranjan