newspaper4k icon indicating copy to clipboard operation
newspaper4k copied to clipboard

Cloudfare Issue with CRHOY.com

Open gabrielgq opened this issue 6 months ago • 2 comments

CRHOY:

This is a Cloudflare issue so I don't know if this is the right place to post but if anyone can help I'd be vary thankful.

crhoy.com

Some sample urls that I have tried

crhoy.com/economia/estas-son-las-razones-por-las-que-sugef-recomienda-destituir-a-presidente-del-popular crhoy.com/economia/empresarios-piden-avanzar-en-proyectos-para-mejorar-la-competitividad

The exact code i used to test this articles/website


import newspaper

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = newspaper.configuration.Configuration()
config.browser_user_agent = user_agent


article = newspaper.article('https://www.crhoy.com/economia/estas-son-las-razones-por-las-que-sugef-recomienda-destituir-a-presidente-del-popular/', config=config)
print(article.text)

Site is protected by Cloudflare I tried more complex methods with readability and selenium, even used 12ft.io and http://txtify.it

gabrielgq avatar Aug 08 '24 20:08 gabrielgq