status-crawler icon indicating copy to clipboard operation
status-crawler copied to clipboard

User Agent and Proxies

Open shtefcs opened this issue 9 years ago • 5 comments

Hi there.

I tested your spider and its working good. Two suggestion are:

  • UserAgents options, so you can change to whatever you want
  • Proxies use, so you can add proxies and crawl pages with it PROXY:PORT:USERNAME:PASSWORD

shtefcs avatar Oct 10 '14 09:10 shtefcs

I added the userAgent config option to cli and config file. Tested and appears to work on my side. @shtefcs can you grab the latest and confirm?

seethroughdev avatar Oct 14 '14 20:10 seethroughdev

Sorry for late reply and thank you for making the changes.

  • Is this right way to change the user agent?Check my screenshot http://screencast.com/t/fKscsLVX3 .
  • Is there a way we can put the whole spider options/configuration into a frontend ui and how we can do that? If we do that, how we would start the scraper without command line casper.js spider.js ?
  • Also is there a way that we can create the schedule so we can setup like a cron job or something like that?

shtefcs avatar Oct 17 '14 19:10 shtefcs

That should work fine for user agent. You can definitely schedule running any script with cron, just keep in mind, this one requires node. It might be easier to just reproduce this in a bash script. As far as a UI on the front end. Yes, it can be done, but its tricky to setup and manage. Hope that helps...!

seethroughdev avatar Oct 28 '14 17:10 seethroughdev

Tnx!

shtefcs avatar Oct 28 '14 20:10 shtefcs

Hello, do you have any idea in order to add Proxy on this library?

PierreAmmeloot avatar Dec 27 '16 11:12 PierreAmmeloot