news-extractor topic

List news-extractor repositories

news-please

2.0k
Stars
405
Forks
Watchers

news-please - an integrated web crawler and information extractor for news that just works

extractnet

182
Stars
20
Forks
Watchers

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

chatWeb

879
Stars
136
Forks
Watchers

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.

textractor

15
Stars
4
Forks
Watchers

从html中提取正文,用于新闻类网页