newspaper
newspaper copied to clipboard
Should newspaper3k bypass a wall on ft.com or medium.com?
This issue asks about specification of newspaper3k. Some media company page (eg ft.com and medium.com) has a wall. newspaper3k doesn't go beyond. For example, when you parse https://www.ft.com/content/2f081189-01dd-4549-a6b0-ab4f04a103cd
, you get
title: Subscribe to read
text: Become an FT subscriber to read:
Leverage our market expertise
Expert insights, analysis and smart data help you cut through the noise to spot trends, risks and opportunities.
Join over 300,000 Finance professionals who already subscribe to the FT.
Similar things happens on medium.com.
Technically there is a way to bypass (eg. https://github.com/iamadamdev/bypass-paywalls-chrome). Should newspaper3k support bypass?
This extension has to be used with a web browser, so it will not work with Newspaper, because it uses Python requests.
sorry for confusing you. I have no intention to parse a physical paper.
Try parsing with 12ft.io
What do you mean try parsing with 12ft.io? Can you provide a parsing code example?