node-osmosis Craigs list Example Doesn't work

The current example doesn't work since crags list changed the structure of their page. The following will work:

var osmosis = require('osmosis');

osmosis
.get('www.craigslist.org/about/sites')
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + nav + div + div li > a')
.set('category')
.follow('@href')
.paginate('.totallink + a.button.next:first')
.find('p > a')
.follow('@href')
.set({
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images':       ['img@src'],
    'id': 'p.postinginfo[0]'
})
.data(function(listing) {
    // do something with listing data
    console.log(listing);
})
.log(console.log)
.error(console.log)
.debug(console.log)

Dec 24 '16 08:12 jspenc72

@jspenc72 And they ban your IP pretty quick too :P

Dec 30 '16 03:12 roblav96

@roblav96 Yes they do. Slow down the requests so they are only one or two per second and you'll be ok. You should be removed from the bank list after 48 hours.

Dec 30 '16 04:12 jspenc72

Good call. Is there an integrated throttle/debounce in this lib?

Dec 30 '16 04:12 roblav96

Check out example 4 on the wiki

https://github.com/rchipka/node-osmosis/wiki#config

I think you may want to delay with set timeout and then call the done() method.

Dec 30 '16 10:12 jspenc72

What does the delay method do though? From what I understand, it delays the next http call in the stack. Or does it delay the find method after calling get to allow js and DOM to load?

Dec 30 '16 18:12 roblav96

That is likely a viable option.

Dec 30 '16 18:12 jspenc72

Try it.

Dec 30 '16 18:12 jspenc72

Or does it delay the find method after calling get to allow js and DOM to load?

I don't think it does this, although I'd like for it to be able to. Does anyone know how to delay a find after a get call?

Feb 14 '17 16:02 jimaldon

@jimdsouza https://rchipka.github.io/node-osmosis/Command.html#delay

Delay each context before continuing down the chain.

Using .delay() after a call to .get() will delay the execution of the command that follows .delay().

Feb 14 '17 16:02 rchipka

node-osmosis node-osmosis copied to clipboard

Craigs list Example Doesn't work

node-osmosis
node-osmosis copied to clipboard