datasets
datasets copied to clipboard
NonMatchingChecksumError while downloading 'multi_news' or 'cnn_dailymail' dataset
Short description Description of the bug.
getting NonMatchingChecksumError while downloading multi_news or cnn_dailymail datasets.
Environment information
-
Operating System:
: Colab -
Python version:
: 3.10 -
tensorflow-datasets
/tfds-nightly
version: tensorflow-datasets 4.9.4 -
tensorflow
/tf-nightly
version: tensorflow 2.15 -
Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ? Yes
Reproduction instructions
(https://colab.sandbox.google.com/gist/singhniraj08/9f80bc167706b9b351b75e003dcad39c/untitled2.ipynb)
If you share a colab, make sure to update the permissions to share it.
Link to logs
NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=1vRY2wM6rlOZrf9exGTm5pXj5ExlVwJ0C, downloaded to /root/tensorflow_datasets/downloads/ucexport_download_id_1vRY2wM6rlOZrf9exGTm5pXj5OT0RBXCg5OWBrYMJXysF1hdrkZtPhK-7JWdYi2HrYYc.tmp.c134b8c8d86c4764bad073c9d79db385/download, has wrong checksum:
- Expected: UrlInfo(size=245.06 MiB, checksum='64ae4d2483b248c9664b50bacfab6821f8a3e93f382c7587686fa4a127f77626', filename='multi-news-original-20190725T164630Z-001.zip')
- Got: UrlInfo(size=2.40 KiB, checksum='d86ce49a2cafe0ed25eae0c9a5ed9abf8db1e34414e3acb667e316ad221c73c5', filename='download') To debug, see: https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror
Expected behavior What you expected to happen.
Dataset should download without any issues.
Additional context Add any other context about the problem here.
Hello @singhniraj08, This is an persisting problem in tfds (#3935) and there is no solutions till now, although you can bypass the issue by just downloading it manually.
Thank you,
@singhniraj08 you can visit link- https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror. For correction and as per my knowledge this issue is not solved yet