text
text copied to clipboard
Torchtext 0.6.0 extract_archive broken with Pytorch train.de.gz & train.en.gz datasets
🐛 Bug
Describe the bug
While following the torchtext tutorial here https://pytorch.org/tutorials/beginner/torchtext_translation_tutorial.html I am unable to download english & german gzipped datasets from the urls specified in the example, receving an error message indicating that downloading any files that don't end in .gz is unsupported.
To Reproduce Steps to reproduce the behavior:
- Go to https://pytorch.org/tutorials/beginner/torchtext_translation_tutorial.html
- Copy all code into jupyter notebook
- Execute code block which imports all torchtext libraries
- Receive error indicating that the filetype is unsupported
Expected behavior Expected - dataset should download. The utils file in the torchtext source code seems to support .gz files, so this should work fine.
Screenshots
Environment
PyTorch version: 1.4.0 Is debug build: False CUDA used to build PyTorch: 10.1 ROCM used to build PyTorch: N/A
OS: Ubuntu 16.04.7 LTS (x86_64) GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 Clang version: Could not collect CMake version: version 3.5.1 Libc version: glibc-2.9
Python version: 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 21:15:04) [GCC 7.3.0] (64-bit runtime) Python platform: Linux-4.14.256-197.484.amzn2.x86_64-x86_64-with-debian-stretch-sid Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 470.57.02 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] sagemaker-pytorch-training==2.4.0
[pip3] torch==1.6.0
[pip3] torchtext==0.6.0
[pip3] torchvision==0.7.0
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.2 256 anaconda
[conda] mkl-include 2020.2 256 anaconda
[conda] numpy 1.19.1 py36h30dfecb_0 anaconda
[conda] numpy-base 1.19.1 py36h75fe3a5_0 anaconda
[conda] pytorch 1.4.0 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] sagemaker-pytorch-training 2.4.0 pypi_0 pypi
[conda] torch 1.6.0 pypi_0 pypi
[conda] torchtext 0.6.0 py_1 pytorch
[conda] torchvision 0.7.0 pypi_0 pypi
This old tutorial was ran with PyTorch 1.7.1, and torchtext 0.8.1. What happens if you use torchtext 0.8.1?
Please note that the different versions of PyTorch and domain libraries are not compatible. The matching version has to be used. See https://github.com/pytorch/text#installation