d2l-en
d2l-en copied to clipboard
WikiText-2 is not a zip file
When I executed the following part:
from d2l import torch as d2l
batch_size, max_len = 512, 64
train_iter, vocab = d2l.load_data_wiki(batch_size, max_len)
from d2l import mxnet as d2l
batch_size, max_len = 512, 64
train_iter, vocab = d2l.load_data_wiki(batch_size, max_len)
I met this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/charry/miniconda3/envs/d2l/lib/python3.9/site-packages/d2l/torch.py", line 2443, in load_data_wiki
data_dir = d2l.download_extract('wikitext-2', 'wikitext-2')
File "/home/charry/miniconda3/envs/d2l/lib/python3.9/site-packages/d2l/torch.py", line 3247, in download_extract
fp = zipfile.ZipFile(fname, 'r')
File "/home/charry/miniconda3/envs/d2l/lib/python3.9/zipfile.py", line 1266, in __init__
self._RealGetContents()
File "/home/charry/miniconda3/envs/d2l/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
I think it is because the dataset in the server has been damaged. I reimplemented this error with d2l 1.0.0 - 1.0.3. And it will cause some errors when WikiText-2 dataset is needed.
I have a pull request failed due to this error. I also mentioned that there are some pull requests related fixing typo errors also failed check due to this error.
I hope this error can be fixed as soon as possible.
The wikitext-2 dataset URL returns this error:
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>MM9XHEKPABYT4NPW</RequestId>
<HostId>KOjOK6r2VNkvN6gS28B7s2akq8hULUJohhsiCnyrL9RMzjk3RAIvYnVZiHGd6PPVEIDnQHTijnI=</HostId>
</Error>
Having the same issue. Is there an updated URL we can use?
Same issue here. According the book, the dataset is from
Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. ArXiv:1609.07843.
In that paper, http://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/ is linked and this site can't be reached anymore. Hence, likewise https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip isn't anymore. Anyone has a good mirror for this?