newspaper4k icon indicating copy to clipboard operation
newspaper4k copied to clipboard

newspaper.article.ArticleException obscures underlying cause of exception

Open AndyTheFactory opened this issue 2 years ago • 1 comments

Issue by dviator Fri Jul 8 01:57:25 2016 Originally opened as https://github.com/codelucas/newspaper/issues/268


Hey, love the library but I am having a little trouble with the way that newspaper.article.ArticleException works. My use case is to directly call article.download() and article.parse() on a list of urls that I'm feeding from another part of my application. The basic issue is that newspaper.article.ArticleException does not differentiate between different causes of failure. In my case, between network timeouts and malformed pages. Quick shell test case here:.

The impact to my application is that I wrap the calls to article.parse() in a retry block so that intermittent network latency can be overcome while my application runs continuously. However when I run into malformed pages, I'd like the application to notice the incomplete response and skip right over them, but it retries instead, causing a large and unnecessary performance impact when there are many consecutive malformed pages. I'm sure I could perform some additional checking in the wrapper code, but the cleaner solution seems to be to throw a different exception when article.download() fails and when article.parse() fails, or to differentiate the cause of errors some other way. In fact, when I first started using the library it was a source of confusion that article.download() would not throw an exception when the network was disabled.

I would be happy to work on contributing a solution if the above seems reasonable to you all who have been maintaining this excellent code. In any event would love to hear your thoughts.

AndyTheFactory avatar Oct 24 '23 10:10 AndyTheFactory

Comment by silviaegt Fri Jul 13 22:29:49 2018


Did you get an answer on this @maevyn11? In my case it was a "failed with 404 Client Error: Not Found for url" problem I tried to avoid this with try: except Exception: pass But it didn't work....

AndyTheFactory avatar Oct 24 '23 10:10 AndyTheFactory