php-article-extractor
php-article-extractor copied to clipboard
Bypass cookie wall?
For example: https://www.hln.be/nieuws/binnenland/prins-laurent-in-beroep-tegen-dotatiesanctie-opgelegd-door-regering~a81f63c8/
Has a pre-screen to accept cookies, so it's trying to parse that, rather than the actually article.
Can anything be done against this?
Unfortunately I haven't found a good way to do this yet. Will keep this open in case I run across one. We also have trouble with redirects that detect lack of cookies. See #20
I'm not sure if this is similar to the problem I found. On some websites, when checking for redirects, the URL passed from checkForRedirects() is a JSON string. Apparently, there is a "Location" somewhere in that JSON. I used this regex to avoid that preg_match('/\b[Ll]ocation: (.*)/', $a, $r). Hope this helps.