node-horseman
node-horseman copied to clipboard
So many 'failed to GET url'
I'm just doing horseman.open('https://www.google.com') for testing but getting sooo many failed to get URL just at random times - maybe about 7 out of 10 times it'll fail.
any idea why?
Kicked the tires for this library following the docs for the project and saw a similar thing. Both Twitter and Google examples failed to run.
horseman v3.3.0 node v 8.9.1
Tried on multiple hosts, and did notice that frequencies vary. But still getting the same error at some point evenutially
Up to this topic, same happening to me
Up to this, I'm getting it repeatedly, not can I catch them
minotaurrr, Google detects scrapper and banned your IP address very quickly. That's mean you can only "horseman.open('http://google.com') " ONCE every 5 minutes. If you want to scrap it more than 1 time per 5 minutes, you need to :
- set up proxy in horseman options
- clean cookies with horseman.cookies()
- changing User-Agent in horseman -also modify your value in horseman.wait(value). If you always have same timing interval between your request, google will flagged it.
Google must have banned your IP. Set the time interval between GET request OR set a list of proxy and cycle through randomly.