python-machine-learning-book-3rd-edition icon indicating copy to clipboard operation
python-machine-learning-book-3rd-edition copied to clipboard

can not download tensorflow datasets

Open Tianyu00 opened this issue 4 years ago • 11 comments

In ch13/ch13_part1.ipynb, line [52], can not download celeb_a dataset in

celeba_bldr.download_and_prepare()

error message:

NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM, downloaded to /Users/tz/tensorflow_datasets/downloads/ucexport_download_id_0B7EVK8r0v71pZjFTYXZWM3FlDDaXUAQO8EGH_a7VqGNLRtW52mva1LzDrb-V723OQN8.tmp.db23c1347e5240e68f5f92216dc1872b/uc, has wrong checksum. This might indicate:

  • The website may be down (e.g. returned a 503 status code). Please check the url.
  • For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
  • The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
  • If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally

Tianyu00 avatar May 19 '20 14:05 Tianyu00

hi, i already commented this on the tensorflow github issues, have a look (searches for link)

not an error from the guys who wrote the book even tho they could mention this on the book or as a comment

elfelround avatar May 20 '20 13:05 elfelround

https://github.com/tensorflow/datasets/issues/965

elfelround avatar May 20 '20 13:05 elfelround

as for now there is no solution, the dataset is 2-3gb and google cloud isnt happy to share such bandwith with people constantly, i would recommend uploading the dataset to an external site/cdn

elfelround avatar May 20 '20 13:05 elfelround

I advise uploading this dataset to kaggle.com

elfelround avatar May 20 '20 13:05 elfelround

be aware that ive tried everything and never managed for it to work, vpns, changing the code to bypass checks... so there is no practical solution other than the one im mentioning

elfelround avatar May 20 '20 13:05 elfelround

Hi elfelround, Thank you for your message! I am not sure if you are the author or contributor of this book? I am aware that this is not the error of the author. I've searched this problem and found some discussion online but I don't really understand them. I think it would be nice to have a discussion of this problem here and maybe direct people to the real problem at tensorflow like the link you attached because other people who are reading the book may have the same problem and they may start searching here.

The actual problem I have now is, is there a way to download and use the celeb_a dataset, no matter it is from tensorflow or elsewhere. It is lucky there is no example using this dataset. However if there is, nobody can follow the example. It would be great if the authors or other people who have greater knowledge about this problem can give us some suggestions. Thanks!

Tianyu00 avatar May 21 '20 11:05 Tianyu00

Hello - as stated in the error message, the dataset did install, but the associated SHA checksums did not match with those anticipated by the Tensorflow source. Ensure the version of Tensorflow you are using is the latest, this has worked for me.

If that is not an option, here [1.3GiB] is a direct download link for the celeba dataset, which you can unpack and use as intended.

jinensetpal avatar Jun 06 '20 22:06 jinensetpal

Thanks a lot for helping out here, @elfelround , I really appreciate it!

Regarding the dataset, I would also recommend saving it somewhere on the machine you'll be using, because there may be hiccups with their servers some times. On the original CelebA website, they also provide an alternative Baidu link that may be useful in such cases: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

rasbt avatar Jun 08 '20 16:06 rasbt

Hi, if you have the problem downloading celeb_a dataset with tfds, try as below.

  1. update tfds: pip install --upgrade tensorflow-datasets
  2. download four dataset manually in https://git.io/JL5GM, and then save ~/tensorflow_datasets/downloads/manual
  3. try again celeba_bldr.download_and_prepare(). :)

If you use colab, upload the files to your drive and then mount, copy it like below

from google.colab import drive
drive.mount('/drive')
!mkdir -p ~/tensorflow_datasets/downloads/manual
!cp /drive/MyDrive/datasets/celeba/img_align_celeba.zip ~/tensorflow_datasets/downloads/manual
!cp /drive/MyDrive/datasets/celeba/list_attr_celeba.txt ~/tensorflow_datasets/downloads/manual
!cp /drive/MyDrive/datasets/celeba/list_eval_partition.txt ~/tensorflow_datasets/downloads/manual
!cp /drive/MyDrive/datasets/celeba/list_landmarks_align_celeba.txt ~/tensorflow_datasets/downloads/manual

Thanks

rickiepark avatar Jan 25 '21 02:01 rickiepark

KeyError: <ExtractMethod.NO_EXTRACT: 1> since celeba_bldr.download_and_prepare() need tfrecord files, so after you got celeb_a dataset_info.json file and txt files(such as list_landmarks_align_celeba.txt, list_attr_celeba.txt) then you need tfrecord files https://drive.google.com/drive/folders/1MKQ9sRwr5OOFk3OBzLz91SsgF3MBqvtP?usp=sharing Folder structure: image

liqinglin54951 avatar Jan 28 '21 20:01 liqinglin54951

I also have a problem with the command "celeba_bldr.download_and_prepare()". I get constantly the error message "HTTP code 429". So I am stuck at this point. Is there any solution to this problem?

manfredkremer avatar Jun 26 '23 14:06 manfredkremer