WKZombie URL unable to parse/Bypass robots.txt

URL unable to parse/Bypass robots.txt

Open hugolundin opened this issue 7 years ago • 2 comments

I have an https url that isn't able to parse. Using other methods, I've needed to bypass robots.txt, but it does not seem exist any setting for this in WKZombie?

Mar 27 '17 07:03 hugolundin

Hi @hugolundin No, currently there's no such setting. What are you trying to accomplish? Maybe changing the user agent or adjusting the http headers might help?

Mar 27 '17 20:03 mkoehnke

I am trying to parse a website for some urls. It has worked fine using selenium with PhantomJS, and also with Mechanize in Python, but when I try doing it with WKZombie, the website loads until it logs "Unable to parse". The reason I thought about robots.txt was because Mechanize complained about it before I activated their setting to bypass it.

Do you have any suggestions in what way there are common to change user agent and/or the http headers? Thank you very much for your reply!

Mar 28 '17 12:03 hugolundin

WKZombie WKZombie copied to clipboard

URL unable to parse/Bypass robots.txt

WKZombie
WKZombie copied to clipboard