php-article-extractor icon indicating copy to clipboard operation
php-article-extractor copied to clipboard

Bypass cookie wall?

Open kylescousin opened this issue 5 years ago • 2 comments

For example: https://www.hln.be/nieuws/binnenland/prins-laurent-in-beroep-tegen-dotatiesanctie-opgelegd-door-regering~a81f63c8/

Has a pre-screen to accept cookies, so it's trying to parse that, rather than the actually article.

Can anything be done against this?

kylescousin avatar Jul 20 '18 17:07 kylescousin

Unfortunately I haven't found a good way to do this yet. Will keep this open in case I run across one. We also have trouble with redirects that detect lack of cookies. See #20

crscheid avatar Jul 20 '18 21:07 crscheid

I'm not sure if this is similar to the problem I found. On some websites, when checking for redirects, the URL passed from checkForRedirects() is a JSON string. Apparently, there is a "Location" somewhere in that JSON. I used this regex to avoid that preg_match('/\b[Ll]ocation: (.*)/', $a, $r). Hope this helps.

bogdangrab avatar Mar 26 '19 10:03 bogdangrab