GNews
GNews copied to clipboard
how can I get more than 100 link
hi Ranahaani please can you explain in detail . so how can we get more than 100 urls. I change the parameters of GNews(max result =10000) but it doesnt work .
by default, google news return 100 results for more than 100 results we need a custom implementation. I'll try to implement it ASAP.
Hello, how can I get more than 100 links per query ?
hey, is more than 100 links an option?
To get more links, you can call gnews from within a for loop from a start date, and increment the period repeatedly. Example:
start_date = date(2020, 1, 1)
end_date = date(2020, 2, 1)
while start_date < end_date:
# Set the date and period
google_news.start_date = (start_date.year, start_date.month, start_date.day)
google_news.period = '1d' # Period of 1 day
# Get the news results
results = google_news.get_news('obesity')
# analyze the results, append them to a list, do whatever you need to do
# Increment the start date by 1 day
start_date += timedelta(days=1)
I have also written a program to recursively call the function for 1/2 the time until all the data is extracted, I've also added a tqdm bar to show the progress
from tqdm.notebook import tqdm_notebook as tqdm
from datetime import datetime, timedelta
def get_related_news(google_news, keyword:str, start_date:datetime, end_date:datetime, bar=None)->list[dict]:
google_news.start_date = (start_date.year, start_date.month, start_date.day)
google_news.end_date = (end_date.year, end_date.month, end_date.day)
if (bar is None):
bar = tqdm(total=(end_date-start_date).days+1, desc="Getting News")
# Get the news results
results = google_news.get_news(keyword)
num = len(results)
if ((num >=99) and ((end_date - start_date)> timedelta(days=4))):
# Recursively call the function for 1/2 the time and add them up
mid_date = start_date + (end_date - start_date) / 2
mid_date = datetime(mid_date.year, mid_date.month, mid_date.day)
# Merge the results
results = get_related_news(google_news, keyword, mid_date+timedelta(days=1), end_date, bar)\
+ get_related_news(google_news, keyword, start_date, mid_date, bar)
#Check tqdm bar and close it
if (bar.total == bar.n):
bar.close()
# Return direclty since results are already sorted
return results
sorted_results= sorted(results,
key=lambda x: datetime.strptime(x['published date'], "%a, %d %b %Y %H:%M:%S %Z"),
reverse=True)
# Update tqdm bar
update_proportion = (end_date - start_date).days+1
bar.update(update_proportion)
if (bar.total == bar.n):
bar.close()
return sorted_results