newspaper4k icon indicating copy to clipboard operation
newspaper4k copied to clipboard

Invalid filtering

Open AndyTheFactory opened this issue 2 years ago • 0 comments

Issue by ZeeshanSultan Mon May 28 13:26:47 2018 Originally opened as https://github.com/codelucas/newspaper/issues/572


https://github.com/codelucas/newspaper/blob/c521057b20bb3d4cd27d8b0ee6efd64d1d3a488f/newspaper/urls.py#L239

The validator uses blacklist based filters to detect bad urls and then whitelist based filter to detect valid urls but the default response is False which should be true since the url passed all blacklist filters and the whitelist filters aren't too broad based on very limited keywords.

Here's an example site which doesn't get detcted http://jewishnews.net.au

AndyTheFactory avatar Oct 24 '23 12:10 AndyTheFactory