newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Should newspaper3k bypass a wall on ft.com or medium.com?

Open nwatab opened this issue 2 years ago • 4 comments

This issue asks about specification of newspaper3k. Some media company page (eg ft.com and medium.com) has a wall. newspaper3k doesn't go beyond. For example, when you parse https://www.ft.com/content/2f081189-01dd-4549-a6b0-ab4f04a103cd, you get

title: Subscribe to read
text: Become an FT subscriber to read:

Leverage our market expertise

Expert insights, analysis and smart data help you cut through the noise to spot trends, risks and opportunities.

Join over 300,000 Finance professionals who already subscribe to the FT.

Similar things happens on medium.com.

Technically there is a way to bypass (eg. https://github.com/iamadamdev/bypass-paywalls-chrome). Should newspaper3k support bypass?

nwatab avatar Jan 14 '22 04:01 nwatab

This extension has to be used with a web browser, so it will not work with Newspaper, because it uses Python requests.

johnbumgarner avatar Jan 14 '22 16:01 johnbumgarner

sorry for confusing you. I have no intention to parse a physical paper.

nwatab avatar Jan 15 '22 01:01 nwatab

Try parsing with 12ft.io

pasenidis avatar Mar 26 '22 20:03 pasenidis

What do you mean try parsing with 12ft.io? Can you provide a parsing code example?

johnbumgarner avatar Mar 26 '22 21:03 johnbumgarner