GNews icon indicating copy to clipboard operation
GNews copied to clipboard

how can I get more than 100 link

Open sametgumus212 opened this issue 2 years ago • 5 comments

hi Ranahaani please can you explain in detail . so how can we get more than 100 urls. I change the parameters of GNews(max result =10000) but it doesnt work .

sametgumus212 avatar Jan 26 '22 12:01 sametgumus212

by default, google news return 100 results for more than 100 results we need a custom implementation. I'll try to implement it ASAP.

ranahaani avatar Feb 07 '22 05:02 ranahaani

Hello, how can I get more than 100 links per query ?

IldarAskarov0405 avatar Dec 02 '22 07:12 IldarAskarov0405

hey, is more than 100 links an option?

khalil0012 avatar Apr 15 '23 07:04 khalil0012

To get more links, you can call gnews from within a for loop from a start date, and increment the period repeatedly. Example:

start_date = date(2020, 1, 1)
end_date = date(2020, 2, 1)

 while start_date < end_date:
        # Set the date and period
        google_news.start_date = (start_date.year, start_date.month, start_date.day)
        google_news.period = '1d'  # Period of 1 day

        # Get the news results
        results = google_news.get_news('obesity')

        # analyze the results, append them to a list, do whatever you need to do

        # Increment the start date by 1 day
        start_date += timedelta(days=1)

alearjun avatar Jul 24 '23 18:07 alearjun

I have also written a program to recursively call the function for 1/2 the time until all the data is extracted, I've also added a tqdm bar to show the progress

from tqdm.notebook import tqdm_notebook as tqdm
from datetime import datetime, timedelta

def get_related_news(google_news, keyword:str, start_date:datetime, end_date:datetime, bar=None)->list[dict]:
  google_news.start_date = (start_date.year, start_date.month, start_date.day)
  google_news.end_date = (end_date.year, end_date.month, end_date.day)

  if (bar is None):
    bar = tqdm(total=(end_date-start_date).days+1, desc="Getting News")
  # Get the news results
  results = google_news.get_news(keyword)
  num = len(results)

  if ((num >=99) and ((end_date - start_date)> timedelta(days=4))):
      # Recursively call the function for 1/2 the time and add them up
      mid_date = start_date + (end_date - start_date) / 2
      mid_date = datetime(mid_date.year, mid_date.month, mid_date.day)

      # Merge the results
      results = get_related_news(google_news, keyword, mid_date+timedelta(days=1), end_date, bar)\
              + get_related_news(google_news, keyword, start_date, mid_date, bar)

      #Check tqdm bar and close it
      if (bar.total == bar.n):
        bar.close()

      # Return direclty since results are already sorted
      return results

  sorted_results= sorted(results,
                         key=lambda x: datetime.strptime(x['published date'], "%a, %d %b %Y %H:%M:%S %Z"),
                         reverse=True)
  # Update tqdm bar
  update_proportion = (end_date - start_date).days+1
  bar.update(update_proportion)

  if (bar.total == bar.n):
    bar.close()

  return sorted_results

TenaciousPorcupine avatar Jan 28 '24 07:01 TenaciousPorcupine