datasets
datasets copied to clipboard
canot load EMNIST dataset
/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET
Short description
Failed to load the emnist
dataset
Environment information
-
Operating System: Linux
-
Python version: 3.9
-
tensorflow-datasets
/tfds-nightly
version: 4.9.4 -
tensorflow
/tf-nightly
version: 12.6.1 -
Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ? ✅
Reproduction instructions
import tensorflow_datasets as tfds
tfds.load("emnist", split=["train"])
If you share a colab, make sure to update the permissions to share it.
Link to logs
Expected behavior
The emnist
dataset is loaded successfully.
Additional context
NonMatchingChecksumError: Artifact https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip, downloaded to /root/tensorflow_datasets/downloads/itl.nist.gov_iaui_vip_cs_links_EMNIST_gzipi4VnNviDSrfd9Zju6qv40flc3wr22t8ldulNStS6tmk.zip.tmp.8cdbd18c3c7144529f0a2a11d1829c60/itl, has wrong checksum:
* Expected: UrlInfo(size=535.73 MiB, checksum='fb9bb67e33772a9cc0b895e4ecf36d2cf35be8b709693c3564cea2a019fcda8e', filename='gzip.zip')
* Got: UrlInfo(size=110.12 KiB, checksum='bfd529724d06f22872f32d6649561a57fd25ec17ea51d6f2ad24b96ea0519c34', filename='itl')
To debug, see: https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror
I tried to download the file directly using the link https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip, but I got redirected to the NIST homepage. I think the link is outdated.
@davidshen84 Well spotted, thanks for opening the issue! It seems the URL (https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip) now redirects to https://www.nist.gov/itl which causes the problem.
Did you find the actual link?
I cannot find any direct download link from the Internet. According to this page, https://www.nist.gov/itl/products-and-services/emnist-dataset, contacting the author is the only way to get the data set.
On Thu, 11 Apr 2024, 19:17 Pierre Marcenac, @.***> wrote:
@davidshen84 https://github.com/davidshen84 Well spotted, thanks for opening the issue! It seems the URL ( https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip) now redirects to https://www.nist.gov/itl which causes the problem.
Did you find the actual link?
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/datasets/issues/5356#issuecomment-2049270039, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQBTL6FMUGUOGH6XLJ6O3Y4ZIKDAVCNFSM6AAAAABF25B3B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBZGI3TAMBTHE . You are receiving this because you were mentioned.Message ID: @.***>
Sorry, I just found out on that NIST page, the "original MNIST dataset" link points to the EMNIST dataset. 😅
Can you check if TF can still handle that file?
Thank you
On Thu, 11 Apr 2024, 21:45 Xi Shen, @.***> wrote:
I cannot find any direct download link from the Internet. According to this page, https://www.nist.gov/itl/products-and-services/emnist-dataset, contacting the author is the only way to get the data set.
On Thu, 11 Apr 2024, 19:17 Pierre Marcenac, @.***> wrote:
@davidshen84 https://github.com/davidshen84 Well spotted, thanks for opening the issue! It seems the URL ( https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip) now redirects to https://www.nist.gov/itl which causes the problem.
Did you find the actual link?
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/datasets/issues/5356#issuecomment-2049270039, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQBTL6FMUGUOGH6XLJ6O3Y4ZIKDAVCNFSM6AAAAABF25B3B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBZGI3TAMBTHE . You are receiving this because you were mentioned.Message ID: @.***>
This should be the new emnist dataset URL: https://biometrics.nist.gov/cs_links/EMNIST/gzip.zip
Did you fix the error? If you have solved it, can you tell me how? I also get the same error :(
No. They just need to update the URL. I think you can manually download the archive and put it in the download folder. TF will skip the downloading, thus skip this bug.
On Wed, 8 May 2024, 00:36 minchan0410, @.***> wrote:
Did you fix the error? If you have solved it, can you tell me how? I also get the same error :(
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/datasets/issues/5356#issuecomment-2098558880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQBTO5CTBFCW3G4MBNVILZBDRFPAVCNFSM6AAAAABF25B3B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJYGU2TQOBYGA . You are receiving this because you were mentioned.Message ID: @.***>
Hello, https://github.com/tensorflow/datasets/pull/5401 which should have solved the issue is now merged! Starting from tomorrow, the change will be available in tfds-nightly.
Thank you both for letting us know. It was helpful!! :)