Filter on category does not really filter articles
Issue by PrajP
Sun Mar 18 02:11:02 2018
Originally opened as https://github.com/codelucas/newspaper/issues/534
Hi, Filter on category does not really filter articles. I need all the articles under category == 'http://cnn.com/health'. However, I get all the articles on CNN including politics, money and others. How do I filter articles only for one category? Thanks, Prajakta.
import newspaper
cnn_paper = newspaper.build('http://cnn.com', memoize_articles=False)
for category in cnn_paper.category_urls(): #print (category) if category == 'http://cnn.com/health': print (category) cat_paper = newspaper.build(category, memoize_articles=False) #print (cat_paper.articles) #Expected all articles of category for article in cat_paper.articles: print (article.url) #expected all articles only in given category but it prints all the cnn articles.
Comment by racindustries
Tue Oct 16 07:45:48 2018
Hi PrajP,
It's only a backup solution of course, but here's how I proceeded to reduce my results to health category :
import newspaper
cnn_paper = newspaper.build('http://cnn.com', memoize_articles=False)
cnn_health_art = []
for article in cnn_paper.articles:
if "health" in article.url:
print(article.url)
cnn_health_art.append(article.url)
Hope this helps