newspaper4k icon indicating copy to clipboard operation
newspaper4k copied to clipboard

Filter on category does not really filter articles

Open AndyTheFactory opened this issue 2 years ago • 1 comments

Issue by PrajP Sun Mar 18 02:11:02 2018 Originally opened as https://github.com/codelucas/newspaper/issues/534


Hi, Filter on category does not really filter articles. I need all the articles under category == 'http://cnn.com/health'. However, I get all the articles on CNN including politics, money and others. How do I filter articles only for one category? Thanks, Prajakta.

import newspaper

cnn_paper = newspaper.build('http://cnn.com', memoize_articles=False)

for category in cnn_paper.category_urls(): #print (category) if category == 'http://cnn.com/health': print (category) cat_paper = newspaper.build(category, memoize_articles=False) #print (cat_paper.articles) #Expected all articles of category for article in cat_paper.articles: print (article.url) #expected all articles only in given category but it prints all the cnn articles.

AndyTheFactory avatar Oct 24 '23 12:10 AndyTheFactory

Comment by racindustries Tue Oct 16 07:45:48 2018


Hi PrajP,

It's only a backup solution of course, but here's how I proceeded to reduce my results to health category :

import newspaper

cnn_paper = newspaper.build('http://cnn.com', memoize_articles=False)

cnn_health_art = []

for article in cnn_paper.articles: if "health" in article.url: print(article.url) cnn_health_art.append(article.url)

Hope this helps

AndyTheFactory avatar Oct 24 '23 12:10 AndyTheFactory