node-scraper
node-scraper copied to clipboard
Easier web scraping using node.js and jQuery
What an awesome scraper platform! Got all geared up in no time. However, as I found single page scraping to work just fine, parallel scraping with many (I had 79)...
I have a recursive script running and after about 100 scrapes I always get: FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory Initially I thought it was some...
Please advise if I'm just a n00b and it's really obvious...
This is not an issue as much as feature request, and i really don't know where else to put it, but i was wondering whether it would be a good...
How can I retrieve http code of the resulting page? like 200, 404, 503 etc.
From this url the compiled version of Goose cannot extract cleaned text: http://www.lastampa.it/2012/12/17/esteri/tecnico-italiano-rapito-in-siria-XHnMBpQFSLnYX3l2xHRvzI/pagina.html But on the demo website this url works. Why?
var $headers = jQuery('header a'); var $first = $($headers[0]); console.log($headers.text()); // logs a bunch of header text console.log($first.text()); // freezes but doesn't throw an error maybe I'm doing something wrong?...
Hi, I'm scrapping a page, that contains links to other pages I'd like to scrap too (that have the next format, eg for paginated results). Was wondering if there is...
It points to https://github.com/mikeal/node-utils/tree/master/request but it seems that https://github.com/mikeal/request is the new home.
Hi, This one is probably obvious, but how do I know which url is parsed (if I pull several in parallel)? Also the object described in the url has several...