node-scraper Parallel scraping results in misses & duplicates

What an awesome scraper platform! Got all geared up in no time.

However, as I found single page scraping to work just fine, parallel scraping with many (I had 79) URLs fails, resulting in missed URLs and duplicates while the total sum of fetched URLs is correct.

I suspect the reason to be the queuing implementation. I tried a little fix on scraper.js that produced results I was hoping.

Jun 13 '11 21:06 deggis

I too faced this problem. its just 20 urls, rather than 79.

Is there a way to enforce a timeout?

Jun 27 '11 12:06 gaara87

What I remember, I think timeout wouldn't help, I think I tried one arrangement. I'm not a JS guru, so I'm not deadsure what would be a rock solid fix for this, but mine worked for me at least :)

Jun 28 '11 16:06 deggis

(Whoops, Comment & Close was kinda too close.)

Jun 28 '11 16:06 deggis

Try using cheerio instead of jsdom and then implementing your own queuing, worked for me!

Jun 13 '13 17:06 nickewansmith