status-crawler
status-crawler copied to clipboard
User Agent and Proxies
Hi there.
I tested your spider and its working good. Two suggestion are:
- UserAgents options, so you can change to whatever you want
- Proxies use, so you can add proxies and crawl pages with it PROXY:PORT:USERNAME:PASSWORD
I added the userAgent config option to cli and config file. Tested and appears to work on my side. @shtefcs can you grab the latest and confirm?
Sorry for late reply and thank you for making the changes.
- Is this right way to change the user agent?Check my screenshot http://screencast.com/t/fKscsLVX3 .
- Is there a way we can put the whole spider options/configuration into a frontend ui and how we can do that? If we do that, how we would start the scraper without command line
casper.js spider.js
? - Also is there a way that we can create the schedule so we can setup like a cron job or something like that?
That should work fine for user agent. You can definitely schedule running any script with cron, just keep in mind, this one requires node. It might be easier to just reproduce this in a bash script. As far as a UI on the front end. Yes, it can be done, but its tricky to setup and manage. Hope that helps...!
Tnx!
Hello, do you have any idea in order to add Proxy on this library?