text icon indicating copy to clipboard operation
text copied to clipboard

Torchtext datasets not iterable

Open yashrathi-git opened this issue 11 months ago • 1 comments

❓ Questions and Help

Description I did this:

>> train_data, val_data, test_data = Multi30k(split=('train', 'valid', 'test'))
>> next(train_data)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[33], line 1
----> 1 next(train_data)

TypeError: 'ShardingFilterIterDataPipe' object is not an iterator

But looking at the docs here, it should be iterable. I also tried using .__iter__.

yashrathi-git avatar Jul 17 '23 07:07 yashrathi-git

You are doing it right. It's just that the datasets are like Schrödinger's cat, you never know if they are going to be alive and working or not when you need them. And this has been the issue for years now.

Edit: I just looked into your code. You are using it wrong.

Here is the correct usage:

next(iter(train_data))

This will create an iterable. Although it still won't work because as I said, something is wrong with the datasets.

afurkank avatar Aug 04 '23 09:08 afurkank