tutorials
tutorials copied to clipboard
[BUG] - Missing Dependency 'portalocker' in PyTorch NLP Tutorials with torchtext.datasets
Add Link
- https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html
- https://pytorch.org/tutorials/beginner/transformer_tutorial.html
Describe the bug
I've been reading some of the pytorch NLP tutorials like text_sentiment_ngrams_tutorial and transformer_tutorial. I noticed that to run them on Google Colab, the package portalocker>=2.0.0
is required to load a dataset from torchtext.datasets
. For example, running the following code from the ngrams tutorial results in an error.
Code:
from torchtext.datasets import AG_NEWS
train_iter = iter(AG_NEWS(split='train'))
Error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/torchdata/datapipes/iter/util/cacheholder.py](https://localhost:8080/#) in _assert_portalocker()
37 try:
---> 38 import portalocker # noqa: F401
39 except ImportError as e:
ModuleNotFoundError: No module named 'portalocker'
During handling of the above exception, another exception occurred:
ModuleNotFoundError Traceback (most recent call last)
6 frames
[<ipython-input-3-b3494c76da2c>](https://localhost:8080/#) in <cell line: 3>()
1 import torch
2 from torchtext.datasets import AG_NEWS
----> 3 train_iter = iter(AG_NEWS(split='train'))
[/usr/local/lib/python3.10/dist-packages/torchtext/data/datasets_utils.py](https://localhost:8080/#) in wrapper(root, *args, **kwargs)
191 if not os.path.exists(new_root):
192 os.makedirs(new_root, exist_ok=True)
--> 193 return fn(root=new_root, *args, **kwargs)
194
195 return wrapper
[/usr/local/lib/python3.10/dist-packages/torchtext/data/datasets_utils.py](https://localhost:8080/#) in new_fn(root, split, **kwargs)
153 result = []
154 for item in _check_default_set(split, splits, fn.__name__):
--> 155 result.append(fn(root, item, **kwargs))
156 return _wrap_datasets(tuple(result), split)
157
[/usr/local/lib/python3.10/dist-packages/torchtext/datasets/ag_news.py](https://localhost:8080/#) in AG_NEWS(root, split)
68
69 url_dp = IterableWrapper([URL[split]])
---> 70 cache_dp = url_dp.on_disk_cache(
71 filepath_fn=partial(_filepath_fn, root, split),
72 hash_dict={_filepath_fn(root, split): MD5[split]},
[/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/datapipe.py](https://localhost:8080/#) in class_function(cls, enable_df_api_tracing, source_dp, *args, **kwargs)
137
138 def class_function(cls, enable_df_api_tracing, source_dp, *args, **kwargs):
--> 139 result_pipe = cls(source_dp, *args, **kwargs)
140 if isinstance(result_pipe, IterDataPipe):
141 if enable_df_api_tracing or isinstance(source_dp, DFIterDataPipe):
[/usr/local/lib/python3.10/dist-packages/torchdata/datapipes/iter/util/cacheholder.py](https://localhost:8080/#) in __init__(self, source_datapipe, filepath_fn, hash_dict, hash_type, extra_check_fn)
205 extra_check_fn: Optional[Callable[[str], bool]] = None,
206 ):
--> 207 _assert_portalocker()
208
209 self.source_datapipe = source_datapipe
[/usr/local/lib/python3.10/dist-packages/torchdata/datapipes/iter/util/cacheholder.py](https://localhost:8080/#) in _assert_portalocker()
45 raise
46 else:
---> 47 raise ModuleNotFoundError(
48 "Package `portalocker` is required to be installed to use this datapipe."
49 "Please use `pip install 'portalocker>=2.0.0'` or"
ModuleNotFoundError: Package `portalocker` is required to be installed to use this datapipe. Please use `pip install 'portalocker>=2.0.0'` or`conda install -c conda-forge 'portalocker>=2/0.0'`to install the package
I got the same error using another dataset like WikiText2(split='train')
from the transformer_tutorial.
Suggested Fix:
Inform the reader to install portalocker>=2.0.0
in every notebook that uses torchtext.datasets
and restart the notebook.
Describe your environment
- Platform: Google Colab
- CUDA: 12.0
- PyTorch version: 2.0.1+cu118
- torchtext version: 0.15.2+cpu
cc @pytorch/team-text-core @Nayef211