tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

[BUG] - Missing Dependency 'portalocker' in PyTorch NLP Tutorials with torchtext.datasets

Open QasimKhan5x opened this issue 1 year ago • 0 comments

Add Link

  • https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html
  • https://pytorch.org/tutorials/beginner/transformer_tutorial.html

Describe the bug

I've been reading some of the pytorch NLP tutorials like text_sentiment_ngrams_tutorial and transformer_tutorial. I noticed that to run them on Google Colab, the package portalocker>=2.0.0 is required to load a dataset from torchtext.datasets. For example, running the following code from the ngrams tutorial results in an error.

Code:

from torchtext.datasets import AG_NEWS
train_iter = iter(AG_NEWS(split='train'))

Error:

---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    [/usr/local/lib/python3.10/dist-packages/torchdata/datapipes/iter/util/cacheholder.py](https://localhost:8080/#) in _assert_portalocker()

     37     try:
---> 38         import portalocker  # noqa: F401
     39     except ImportError as e:


ModuleNotFoundError: No module named 'portalocker'


During handling of the above exception, another exception occurred:


ModuleNotFoundError                       Traceback (most recent call last)
6 frames
[<ipython-input-3-b3494c76da2c>](https://localhost:8080/#) in <cell line: 3>()
      1 import torch
      2 from torchtext.datasets import AG_NEWS
----> 3 train_iter = iter(AG_NEWS(split='train'))


[/usr/local/lib/python3.10/dist-packages/torchtext/data/datasets_utils.py](https://localhost:8080/#) in wrapper(root, *args, **kwargs)
    191             if not os.path.exists(new_root):
    192                 os.makedirs(new_root, exist_ok=True)
--> 193             return fn(root=new_root, *args, **kwargs)
    194 
    195         return wrapper


[/usr/local/lib/python3.10/dist-packages/torchtext/data/datasets_utils.py](https://localhost:8080/#) in new_fn(root, split, **kwargs)
    153         result = []
    154         for item in _check_default_set(split, splits, fn.__name__):
--> 155             result.append(fn(root, item, **kwargs))
    156         return _wrap_datasets(tuple(result), split)
    157 


[/usr/local/lib/python3.10/dist-packages/torchtext/datasets/ag_news.py](https://localhost:8080/#) in AG_NEWS(root, split)
     68 
     69     url_dp = IterableWrapper([URL[split]])
---> 70     cache_dp = url_dp.on_disk_cache(
     71         filepath_fn=partial(_filepath_fn, root, split),
     72         hash_dict={_filepath_fn(root, split): MD5[split]},


[/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/datapipe.py](https://localhost:8080/#) in class_function(cls, enable_df_api_tracing, source_dp, *args, **kwargs)
    137 
    138         def class_function(cls, enable_df_api_tracing, source_dp, *args, **kwargs):
--> 139             result_pipe = cls(source_dp, *args, **kwargs)
    140             if isinstance(result_pipe, IterDataPipe):
    141                 if enable_df_api_tracing or isinstance(source_dp, DFIterDataPipe):


[/usr/local/lib/python3.10/dist-packages/torchdata/datapipes/iter/util/cacheholder.py](https://localhost:8080/#) in __init__(self, source_datapipe, filepath_fn, hash_dict, hash_type, extra_check_fn)
    205         extra_check_fn: Optional[Callable[[str], bool]] = None,
    206     ):
--> 207         _assert_portalocker()
    208 
    209         self.source_datapipe = source_datapipe


[/usr/local/lib/python3.10/dist-packages/torchdata/datapipes/iter/util/cacheholder.py](https://localhost:8080/#) in _assert_portalocker()
     45             raise
     46         else:
---> 47             raise ModuleNotFoundError(
     48                 "Package `portalocker` is required to be installed to use this datapipe."
     49                 "Please use `pip install 'portalocker>=2.0.0'` or"


ModuleNotFoundError: Package `portalocker` is required to be installed to use this datapipe. Please use `pip install 'portalocker>=2.0.0'` or`conda install -c conda-forge 'portalocker>=2/0.0'`to install the package        

I got the same error using another dataset like WikiText2(split='train') from the transformer_tutorial.

Suggested Fix:

Inform the reader to install portalocker>=2.0.0 in every notebook that uses torchtext.datasets and restart the notebook.

Describe your environment

  • Platform: Google Colab
  • CUDA: 12.0
  • PyTorch version: 2.0.1+cu118
  • torchtext version: 0.15.2+cpu

cc @pytorch/team-text-core @Nayef211

QasimKhan5x avatar Jun 02 '23 14:06 QasimKhan5x