newspaper4k issues

Cloudfare Issue with CRHOY.com

2

**CRHOY:** This is a Cloudflare issue so I don't know if this is the right place to post but if anyone can help I'd be vary thankful. > crhoy.com **Some...

gabrielgq

sites not working

[BUG] Google News link schema changed?

6

For decoding Google News URLs into their real ones, I am getting error ```python import base64 import re # Some url encoding related constants _ENCODED_URL_PREFIX = "https://news.google.com/rss/articles/" _ENCODED_URL_PREFIX_WITH_CONSENT = (...

moehmeni

bug

[BUG] fail to get images from article

**Describe the bug** get wrong images from the article **To Reproduce** run this code ```python from newspaper import Article url = 'https://www.24h.com.vn/thoi-trang-hi-tech/iphone-noi-tieng-mot-thoi-nay-gia-re-co-man-oled-camera-chup-dep-c407a1590584.html' a = Article(url) a.download() a.parse() a.images ``` **Expected...

thangckt

bug

Consider switching from lxml's clean_html for enhanced security (and possibly performance)

1

**Issue by [frenzymadness](https://github.com/frenzymadness)** _Wed Aug 30 08:12:19 2023_ _Originally opened as https://github.com/codelucas/newspaper/issues/972_ ---- I'd like to bring to your attention that we are [discussing](https://bugs.launchpad.net/lxml/+bug/1958539) the possibility of removing lxml's clean_html...

AndyTheFactory

enhancement

refactoring

[SITES] www.mprnews.org

### First please check that it is really an issue with the library, and not some special case of website: - [x] There is no paywall - [x] You do...

palfrey

sites not working

[SITES] https://www.scientificamerican.com/article/china-has-plans-for-the-worlds-largest-particle-collider/

### First please check that it is really an issue with the library, and not some special case of website: - [x] There is no paywall - [x] You do...

palfrey

sites not working

How to get the list of all websites that are available for scraping?

2

**Issue by [aleksandar-devedzic](https://github.com/aleksandar-devedzic)** _Sun Jul 18 16:28:56 2021_ _Originally opened as https://github.com/codelucas/newspaper/issues/903_ ---- Is there a way to get a list of websites that can be crawled property with newspaper...

AndyTheFactory

documentation

It turns out that a lot of sites do not work with

3

**Issue by [alekssamos](https://github.com/alekssamos)** _Fri Feb 25 08:35:22 2022_ _Originally opened as https://github.com/codelucas/newspaper/issues/937_ ---- I am completely disenchanted. Why these dictionaries, key stop words? From many sites, instead of the text...

AndyTheFactory

bug

sites not working

Make lxml requirement less restrictive

Since lxml version 5.2.0, lxml.html.clean (required by newspaper) got extracted into a separate library. Using the [html_clean] extra allows for lxml versions >= 5.2.0 (for older versions the extra will...

t1h0

[SITES] https://www.bizjournals.com

### First please check that it is really an issue with the library, and not some special case of website: - [ X ] There is no paywall - [...

tbrox

sites not working

newspaper4k
newspaper4k copied to clipboard

Metadata

Cloudfare Issue with CRHOY.com

[BUG] Google News link schema changed?

[BUG] fail to get images from article

Consider switching from lxml's clean_html for enhanced security (and possibly performance)

[SITES] www.mprnews.org

[SITES] https://www.scientificamerican.com/article/china-has-plans-for-the-worlds-largest-particle-collider/

How to get the list of all websites that are available for scraping?

It turns out that a lot of sites do not work with

Make lxml requirement less restrictive

[SITES] https://www.bizjournals.com

← Metadata

Owner

Metadata

newspaper4k newspaper4k copied to clipboard

Metadata

← Metadata

Owner

Metadata

newspaper4k
newspaper4k copied to clipboard