text
text copied to clipboard
torchtext.datasets - requests.exceptions.ConnectionError
🐛 Bug
Description of the bug
When I try to use Multi30k dataset, I get this error:
requests.exceptions.ConnectionError:
This exception is thrown by __iter__ of HTTPReaderIterDataPipe(skip_on_error=False, source_datapipe=OnDiskCacheHolderIterDataPipe, timeout=None)
To Reproduce
from torchtext.datasets import Multi30k
SRC_LANGUAGE = 'de'
TGT_LANGUAGE = 'en'
train_iter = Multi30k(split='train', language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))
next(iter(train_iter))
Expected behavior
Return a proper iterable where I can iterate over the dataset.
Environment
PyTorch version: 1.13.1+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Enterprise GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A
Python version: 3.10.9 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22621-SP0 Is CUDA available: False CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: GeForce GTX 1650 Nvidia driver version: 442.23 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture=9 CurrentClockSpeed=2592 DeviceID=CPU0 Family=198 L2CacheSize=1536 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2592 Name=Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz ProcessorType=3 Revision=
Versions of relevant libraries: [pip3] flake8==6.0.0 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.23.5 [pip3] numpydoc==1.5.0 [pip3] torch==1.13.1 [pip3] torchdata==0.5.1 [pip3] torchtext==0.14.1 [conda] Could not collect
Additional context
I've been running into issues with the Multi30K dataset for some time now. The issue that was occurring before was resolved by installing specific versions and combinations of the relevant torch libraries I specified. However, even this solution doesn't work anymore. Can you please fix what's broken with this cursed dataset?
Thank you.
I also tried this:
from torchtext.datasets import Multi30k
from torch.utils.data import DataLoader
datapipe = Multi30k(split='train', language_pair=('de', 'en'))
loader = DataLoader(datapipe, drop_last=True, shuffle=False)
next(iter(loader))
Now I get a different error:
Exception: Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.
This exception is thrown by __iter__ of HTTPReaderIterDataPipe(source_datapipe=OnDiskCacheHolderIterDataPipe, timeout=None)
Environment is the same. Same error occurs with DataLoader2 as well.
network error. Check Internet settings