datasets icon indicating copy to clipboard operation
datasets copied to clipboard

checksum not matching

Open cpguest opened this issue 8 months ago • 1 comments

What I need help with / What I was wondering When I attempt to load the caltech_birds2011 dataset into tensorflow using the directions from https://www.tensorflow.org/datasets/overview I receive a message stating that the checksums do not match.

What I've tried so far I have attempted to enable enable register_checksums using the two following methods but these the presence of register_checksums on both of these appears to be ignored resulting in the same error.

by adding dl_manager._register_checksums=True

via cli tfds build --register_checksums

I have also attempted to manually download the dataset and point data_dir to the CUB_200_2011 folder. I've also attempted this with the dataset store in tensorflow_dataset/downloads/manual in accordance to the directions found https://www.tensorflow.org/datasets/overview#manual_download_if_download_fails

Is there something that I am missing or can the checksums be updated to match? Any help that you can provide would be appreciated.

Environment information (if applicable)

  • Operating System: raspberry pi 5 bookworm
    • Python version: 3.11.12
  • tensorflow-datasets/tfds-nightly version: 4.9.8.dev20250420004
  • tensorflow/tensorflow-gpu/tf-nightly/tf-nightly-gpu version: 2.19.0

cpguest avatar Apr 20 '25 02:04 cpguest

Hi @cpguest , thank you for opening this issue. Could you please paste your error message here?

ccl-core avatar May 06 '25 08:05 ccl-core

It looks like the dataset URL was updated 10 months ago in #5547, but the checksum weren't updated. I can send a PR to update the checksums.

SanjaySG avatar Jun 29 '25 16:06 SanjaySG

It looks like the Google drive links for the dataset are not working and the dataset has been moved to https://data.caltech.edu/records/65de6-vp158. Let me update the urls as well.

SanjaySG avatar Jun 29 '25 18:06 SanjaySG