newspaper issues

OSError: Couldn't open file /usr/local/lib/python3.7/dist-packages/newspaper/resources/text/stopwords-th.txt

It can't use in the Thai language.

5hyfilm-zz

set daemon attribute directly instead of calling setDaemon function to clear deprecation warning in python3.10

Fixes #883

Narendra-Neerukonda

Scrape og:image:secure_url og:image:url

Fixes #403 and removes some unnecessary variables, simplifying meta image scraping logic.

mamoit

poor top_image results (improve when dimension check on og:image added)

We found that the `top_image` for a noticeable number of stories in a sample of news we were working on returned favicons. This happened on stories from popular and large...

rahulbot

Include all nodes with text

2

Closes #363 - To tackle missing article paragraphs, this suggestion considers any node with text to be included in the final text attribute of an article instance - Test cases...

jecarr

get publish date failed

3

test 1000 urls from 100 web site ，90% publis date is None..

saha65536

Blacklist tags when parsing

Hi! Is there a way to blacklist certain tags so that any text inside them will not be parsed and skipped entirely? For example when I parse a page I...

kaytrance

use pyinstaller make a exe file. when it runs , get parse() exception Positioning issues on article.py meta_lang = self.extractor.get_meta_lang(self.clean_doc) self.set_meta_language(meta_lang) if run like python xx.py then it runs fine...

saha65536

newspaper.fulltext AttributeError

14

AttributeError: 'NoneType' object has no attribute 'xpath' Repro with python3: >>> import requests >>> import newspaper >>> resp = requests.get("https://capitalandgrowth.org/questions/1250/hair-salon-appointments-what-is-the-best-exit-inte.html") >>> newspaper.fulltext(resp.text) File "/usr/local/lib/python3.7/site-packages/newspaper/api.py", line 91, in fulltext top_node =...

trevlovett

bug

newspaper.build not complete?

1

I run the newspaper.build for my url. but i found it get some data for me, but not complete. is there anything i need to pay attention?

nickhuangxinyu

newspaper
newspaper copied to clipboard

Metadata

OSError: Couldn't open file /usr/local/lib/python3.7/dist-packages/newspaper/resources/text/stopwords-th.txt

set daemon attribute directly instead of calling setDaemon function to clear deprecation warning in python3.10

Scrape og:image:secure_url og:image:url

poor top_image results (improve when dimension check on og:image added)

Include all nodes with text

get publish date failed

Blacklist tags when parsing

pyinstaller exe file error

newspaper.fulltext AttributeError

newspaper.build not complete?

← Metadata

Owner

Metadata

newspaper newspaper copied to clipboard

Metadata

← Metadata

Owner

Metadata

newspaper
newspaper copied to clipboard