text
text copied to clipboard
Converting train_iter to list
❓ Why does this happen when I convert train_iter to a list using list()
Description
import torch
from torchtext.datasets import AG_NEWS
train_iter = AG_NEWS(split='train')
x = list(train_iter)[:10]
print(x)
I am running the above 2 blocks of code in COLAB. When I run the 2nd block of code for the first time it returns a list of tuples containing the first 10 elements of data in train_iter. But when I run the 2nd block of code a second time I get an empty array. Why is this happening? (Only run the 1st block of code once)
Output 1
[(3, "Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again."), (3, 'Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\\which has a reputation for making well-timed and occasionally\\controversial plays in the defense industry, has quietly placed\\its bets on another part of the market.'), (3, "Oil and Economy Cloud Stocks' Outlook (Reuters) Reuters - Soaring crude prices plus worries\\about the economy and the outlook for earnings are expected to\\hang over the stock market next week during the depth of the\\summer doldrums."), (3, 'Iraq Halts Oil Exports from Main Southern Pipeline (Reuters) Reuters - Authorities have halted oil export\\flows from the main pipeline in southern Iraq after\\intelligence showed a rebel militia could strike\\infrastructure, an oil official said on Saturday.'), (3, 'Oil prices soar to all-time record, posing new menace to US economy (AFP) AFP - Tearaway world oil prices, toppling records and straining wallets, present a new economic menace barely three months before the US presidential elections.'), (3, 'Stocks End Up, But Near Year Lows (Reuters) Reuters - Stocks ended slightly higher on Friday\\but stayed near lows for the year as oil prices surged past #36;46\\a barrel, offsetting a positive outlook from computer maker\\Dell Inc. (DELL.O)'), (3, "Money Funds Fell in Latest Week (AP) AP - Assets of the nation's retail money market mutual funds fell by #36;1.17 billion in the latest week to #36;849.98 trillion, the Investment Company Institute said Thursday."), (3, 'Fed minutes show dissent over inflation (USATODAY.com) USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump.'), (3, 'Safety Net (Forbes.com) Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of #36;70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, "buying insurance was the furthest thing from my mind," says Riley.'), (3, "Wall St. Bears Claw Back Into the Black NEW YORK (Reuters) - Short-sellers, Wall Street's dwindling band of ultra-cynics, are seeing green again.")]
Output 2
[]
Hi @ASKanse the datasets in torchtext are of type Iterable Datasets. When you call the list first time, it materialized the dataset and the iterator is exhausted. Calling list again on same iterator would return empty since the iterator is already exhausted.
You would need to create dataset object every time you need to iterate over it again (for example inside epochs of training loop). Creating the iterator comes at no cost, as it is done lazily.