newspaper4k
newspaper4k copied to clipboard
Cloudfare Issue with CRHOY.com
CRHOY:
This is a Cloudflare issue so I don't know if this is the right place to post but if anyone can help I'd be vary thankful.
crhoy.com
Some sample urls that I have tried
crhoy.com/economia/estas-son-las-razones-por-las-que-sugef-recomienda-destituir-a-presidente-del-popular crhoy.com/economia/empresarios-piden-avanzar-en-proyectos-para-mejorar-la-competitividad
The exact code i used to test this articles/website
import newspaper
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = newspaper.configuration.Configuration()
config.browser_user_agent = user_agent
article = newspaper.article('https://www.crhoy.com/economia/estas-son-las-razones-por-las-que-sugef-recomienda-destituir-a-presidente-del-popular/', config=config)
print(article.text)
Site is protected by Cloudflare I tried more complex methods with readability and selenium, even used 12ft.io and http://txtify.it