node-scraper issues

Parallel scraping results in misses & duplicates

4

What an awesome scraper platform! Got all geared up in no time. However, as I found single page scraping to work just fine, parallel scraping with many (I had 79)...

deggis

FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory

2

I have a recursive script running and after about 100 scrapes I always get: FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory Initially I thought it was some...

krunkosaurus

A way to know when all scraper requests have completed for the entire process?

1

Please advise if I'm just a n00b and it's really obvious...

jfrux

Final Callback

1

This is not an issue as much as feature request, and i really don't know where else to put it, but i was wondering whether it would be a good...

gaara87

how to get http code

How can I retrieve http code of the resulting page? like 200, 404, 503 etc.

sabmark

Text not scraped, but on the demo site it works!

From this url the compiled version of Goose cannot extract cleaned text: http://www.lastampa.it/2012/12/17/esteri/tecnico-italiano-rapito-in-siria-XHnMBpQFSLnYX3l2xHRvzI/pagina.html But on the demo website this url works. Why?

ross85

Array access doesn't seem to work?

var $headers = jQuery('header a'); var $first = $($headers[0]); console.log($headers.text()); // logs a bunch of header text console.log($first.text()); // freezes but doesn't throw an error maybe I'm doing something wrong?...

robdodson

How to add a url to scrap?

3

Hi, I'm scrapping a page, that contains links to other pages I'd like to scrap too (that have the next format, eg for paginated results). Was wondering if there is...

tttp

README.md has bad link to request

It points to https://github.com/mikeal/node-utils/tree/master/request but it seems that https://github.com/mikeal/request is the new home.

drewish

How do I know the parsed url?

Hi, This one is probably obvious, but how do I know which url is parsed (if I pull several in parallel)? Also the object described in the url has several...

tttp

node-scraper
node-scraper copied to clipboard

Metadata

Parallel scraping results in misses & duplicates

FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory

A way to know when all scraper requests have completed for the entire process?

Final Callback

how to get http code

Text not scraped, but on the demo site it works!

Array access doesn't seem to work?

How to add a url to scrap?

README.md has bad link to request

How do I know the parsed url?

← Metadata

Owner

Metadata

node-scraper node-scraper copied to clipboard

Metadata

← Metadata

Owner

Metadata

node-scraper
node-scraper copied to clipboard