node-osmosis icon indicating copy to clipboard operation
node-osmosis copied to clipboard

.paginate is crashing

Open danheidel opened this issue 8 years ago • 5 comments

I'm getting a fatal error using paginate with the following query. If I set the paginate limit to 1, it works fine. However, if the paginate limit is 2 or higher (to the desired maximum of 10) the query crashes right at the end. From the output, it's clear that the inner .data is still returning after the outer .data returns. In the case of pagination limit 2, it consistently gives me an array of results that is missing one element. At maximum pagination, it consistently is missing 5 results in the final .data array before the crash occurs.

osmosis
  .get(' ... URL ...')
  .set([
  osmosis
    .find('div.ColumnSiteNav > div:first > a:nth(0)')
    .follow('@href')
    .paginate('div.pagination-mini > ul > li:last > a', 10)
    .find('div#ContentColumn > div > div.row-fluid')
    .set({'date': 'div.span3 > p'})
    .follow('div.span7 > b > a')
    .set({
      'title':'div#ContentColumn > div > div.row-fluid > h3',
      'text':'div#ContentColumn > div > div.row-fluid > p'
    })
    .data(function(data){
      console.log(data.date);
    })
  ])
  .data(function(data){
    console.log('done');
    console.log(data.length);
    //console.log(data);
  })

The returned error message is:

.../llscraper/node_modules/osmosis/lib/commands/set.js:161
        data.push(val(context, data));
                  ^

TypeError: val is not a function
    at setArray (..../llscraper/node_modules/osmosis/lib/commands/set.js:161:19)
    at Data.<anonymous> (.../llscraper/node_modules/osmosis/lib/commands/set.js:157:17)
    at Data.unref (.../llscraper/node_modules/osmosis/lib/Data.js:120:29)
    at .../llscraper/node_modules/osmosis/lib/Command.js:169:14
    at Object.Data [as cb] (.../llscraper/node_modules/osmosis/lib/commands/data.js:19:5)
    at Object.Command.start (.../llscraper/node_modules/osmosis/lib/Command.js:159:21)
    at .../llscraper/node_modules/osmosis/lib/Command.js:166:18
    at dataDone (.../llscraper/node_modules/osmosis/lib/commands/set.js:97:17)
    at Object.cb (.../llscraper/node_modules/osmosis/lib/commands/set.js:140:17)
    at Object.Command.start (.../llscraper/node_modules/osmosis/lib/Command.js:159:21)

A quick answer would be greatly appreciated! This data is being scraped from a website that is being taken down at the end of the day.

danheidel avatar Oct 15 '16 18:10 danheidel

I am also facing the same issue ):

@danheidel, did you solve your problem?

jaspersorrio avatar May 01 '17 13:05 jaspersorrio

After doing some digging and going console.log(val.toString()), val is supposed to be

function (context, data, done) {
        instance.start(context,
                       data.child()
                           .setIndex(index)
                           .done(done)
                           .ref());
}

In this case, val is undefined.

@rchipka how should we move on from here?

jaspersorrio avatar May 02 '17 14:05 jaspersorrio

Also seeing this issue. In addition, the result set I receive when using paginate is different every time I run my script, even though the test content I am scraping does not change.

brianblakely avatar May 14 '17 23:05 brianblakely

Did anyone solve this? Getting the same error..

hjalmarsn avatar Dec 07 '17 19:12 hjalmarsn

Has anyone solved the problem? the question is very relevant

Ruinevo avatar Nov 25 '18 12:11 Ruinevo