text
text copied to clipboard
t5_demo can't retrieve CNNDM from drive.google; how to use local copy?
🐛 Bug
Describe the bug A clear and concise description of what the bug is.
Following the t5_demo, but when it tries to access the CNN data at https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ
To Reproduce Steps to reproduce the behavior:
-
Get notebook at t5_demo,
-
Try to run it. It gets as far as
batch = next(iter(cnndm_dataloader))(https://pytorch.org/text/stable/tutorials/t5_demo.html#generate-summaries) wherecnndm_datapipe = CNNDM(split="test")(https://pytorch.org/text/stable/tutorials/t5_demo.html#datasets) -
Get error like:
RuntimeError: Google drive link
https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ&confirm=t internal error: headers don't contain content-disposition. This is usually caused by using a sharing/viewing link instead of a download link. Click 'Download' on the Google Drive page, which should redirect you to a download page, and use the link of that page.
This exception is thrown by iter of GDriveReaderDataPipe(skip_on_error=False, source_datapipe=OnDiskCacheHolderIterDataPipe, timeout=None)
Expected behavior
Looking at others with similar error messages makes it seem like there is some timeout issue retrieving from drive.google? So I went and got the cnn_stories.tgz and dailymail_stories.tgz and unpacked them:
. ├── CNNDM │ ├── cnn │ │ └── stories │ └── dailymail │ └── stories
How can I modify the calls retrieve from my local cache?
Environment
% python collect_env.py Collecting environment information... PyTorch version: 2.1.0.post100 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: macOS 14.4.1 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.1.0.2.5) CMake version: Could not collect Libc version: N/A
Python version: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:38:07) [Clang 16.0.6 ] (64-bit runtime) Python platform: macOS-14.4.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Apple M1 Pro
Versions of relevant libraries: [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.3 [pip3] torch==2.1.0.post100 [pip3] torchaudio==2.1.2 [pip3] torchdata==0.7.1 [pip3] torchtext==0.16.1 [pip3] torchvision==0.16.2 [conda] captum 0.7.0 0 pytorch [conda] numpy 1.26.2 pypi_0 pypi [conda] numpy-base 1.26.3 py311hfbfe69c_0
[conda] pytorch 2.1.0 gpu_mps_py311hf322ab5_100
[conda] torch 2.1.2 pypi_0 pypi [conda] torchaudio 2.1.2 pypi_0 pypi [conda] torchdata 0.7.1 pypi_0 pypi [conda] torchtext 0.16.1 pypi_0 pypi [conda] torchvision 0.16.2 pypi_0 pypi
Additional context Add any other context about the problem here.