relbench
relbench copied to clipboard
Unable to obtain rel-stackex because of hash mismatch
I am getting this error while trying to download the rel-stackex dataset. I am following the examples in the repo readme. rel-amazondoes get downloaded fine
>>> dataset = get_dataset(name="rel-stackex")
Downloading file 'rel-stackex/db.zip' from 'https://relbench.stanford.edu/staging_data/rel-stackex/db.zip' to '/root/.cache/relbench'.
100%|███████████████████████████████████████| 882M/882M [00:00<00:00, 3.58TB/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/relbench/datasets/__init__.py", line 18, in get_dataset
return dataset_cls_dict[name](*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/relbench/datasets/stackex.py", line 26, in __init__
super().__init__(process=process)
File "/usr/local/lib/python3.10/dist-packages/relbench/data/dataset.py", line 67, in __init__
db_path = _pooch.fetch(
File "/usr/local/lib/python3.10/dist-packages/pooch/core.py", line 589, in fetch
stream_download(
File "/usr/local/lib/python3.10/dist-packages/pooch/core.py", line 808, in stream_download
hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
File "/usr/local/lib/python3.10/dist-packages/pooch/hashes.py", line 176, in hash_matches
raise ValueError(
ValueError: SHA256 hash of downloaded file (db.zip) does not match the known hash: expected dfb84faa4918c6c4ecac791a69a30a477a7bee097d7295d48c78ceb8f59c997c but got deb00ccdf825e569b34935834444429cd1c0074b50226b12d616aab22d36242d. Deleted download for safety. The downloaded file may have been corrupted or the known hash may be outdated.
FYI: I was able to proceed by downloading it directly into the local cache directory.
Not sure what is causing it to fail using get_dataset(), since I didn't have to update the hardcoded hashes. I am on ubuntu:latest docker image running on a Mac, with python version Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux.
Need to modify the __init__.py file in /PATH-TO-miniconda3/lib/python3.8/site-packages/relbench/__init__.py and change the corresponding hash in pooth.create and re-import the pkg
_pooch = pooch.create(
path=pooch.os_cache("relbench"),
base_url="https://relbench.stanford.edu/staging_data/", # TODO: change
registry={
# extremely small dataset only used for testing download functionality
"rel-amazon-fashion_5_core/db.zip": "27e08bc808438e8619560c54d0a4a7a11e965b90b8c70ef3a0928b44a46ad028",
"rel-amazon-fashion_5_core/tasks/rel-amazon-churn.zip": "d98f2240aefa0f175dab2fce4a48a1cc595be584d4960cd9eb750d012326117d",
"rel-amazon-fashion_5_core/tasks/rel-amazon-ltv.zip": "bd2b7b798efad2838a3701def8386dba816b45ef277a8e831052b79f5448aed8",
"rel-stackex/db.zip": "dfb84faa4918c6c4ecac791a69a30a477a7bee097d7295d48c78ceb8f59c997c",
"rel-stackex/tasks/rel-stackex-engage.zip": "9afce696507cf2f1a2655350a3d944fd411b007c05a389995fe7313084008d18",
"rel-stackex/tasks/rel-stackex-votes.zip": "0dab5bebd76a95d689c8a3a62026c1c294a252c561fd940e8d9329d165d98a5a",
"rel-amazon-books_5_core/db.zip": "2f6bd920bcfe08cbb7d47115f47f8d798a2ec1a034b6c2f3d8d9906e967454b4",
"rel-amazon-books_5_core/tasks/rel-amazon-churn.zip": "d3890621b1576a9d5b6bc273cdd2ea2084aeaf9c8055c1421ded84be0c48dacb",
"rel-amazon-books_5_core/tasks/rel-amazon-ltv.zip": "2e91be0ca5d9f591d8e33a40f70b97db346090a8bb9f3a94f49b147f0dc136be",
},
)
Should be fixed in v1.0.0, which we are working towards officially releasing soon.