matbench icon indicating copy to clipboard operation
matbench copied to clipboard

hash not matching, (failing tests) seems related to matminer

Open sgbaird opened this issue 3 years ago • 2 comments

@ardunn tests are failing, seems related to matminer. e.g.

======================================================================
ERROR: test_has_polymorphs (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 464, in test_has_polymorphs
    mbt = MatbenchTask("matbench_steels", autoload=True)
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 89, in __init__
    self.df = load(self.dataset_name) if autoload else None
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
======================================================================
ERROR: test_instantiation (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 35, in test_instantiation
    MatbenchTask(ds, autoload=True)
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 89, in __init__
    self.df = load(self.dataset_name) if autoload else None
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
======================================================================
ERROR: test_record (matbench.tests.test_task.TestMatbenchTask)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/matbench/matbench/matbench/tests/test_task.py", line 211, in test_record
    mbt.load()
  File "/home/runner/work/matbench/matbench/matbench/task.py", line 235, in load
    self.df = load(self.dataset_name)
  File "/home/runner/work/matbench/matbench/matbench/data_ops.py", line 66, in load
    df = load_dataset(dataset_name)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/dataset_retrieval.py", line 66, in load_dataset
    _validate_dataset(
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/matminer/datasets/utils.py", line 89, in _validate_dataset
    raise UserWarning(
UserWarning: Error, hash of downloaded file does not match that included in metadata, the data may be corrupt or altered
----------------------------------------------------------------------
Ran 30 tests in 73.[767](https://github.com/materialsproject/matbench/runs/6874143276?check_suite_focus=true#step:4:768)s

Originally posted by @sgbaird in https://github.com/materialsproject/matbench/issues/152#issuecomment-1154728796

sgbaird avatar Jun 27 '22 21:06 sgbaird

Causing some downstream issues in CrabNet CI, too, https://github.com/sparks-baird/CrabNet/runs/7081532854?check_suite_focus=true

sgbaird avatar Jun 27 '22 21:06 sgbaird

Note to self: caused by matminer downloading datasets from figshare and likely CI IP being blocked/rate limited or some other CI-specific nonsense; can likely be fixed by trying download and if it fails retrying after some set amount of time (or just including this in the matminer core code for load_dataset)

ardunn avatar Aug 13 '22 09:08 ardunn