node-osmosis icon indicating copy to clipboard operation
node-osmosis copied to clipboard

Craigs list Example Doesn't work

Open jspenc72 opened this issue 8 years ago • 9 comments

The current example doesn't work since crags list changed the structure of their page. The following will work:

var osmosis = require('osmosis');

osmosis
.get('www.craigslist.org/about/sites')
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + nav + div + div li > a')
.set('category')
.follow('@href')
.paginate('.totallink + a.button.next:first')
.find('p > a')
.follow('@href')
.set({
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images':       ['img@src'],
    'id': 'p.postinginfo[0]'
})
.data(function(listing) {
    // do something with listing data
    console.log(listing);
})
.log(console.log)
.error(console.log)
.debug(console.log)

jspenc72 avatar Dec 24 '16 08:12 jspenc72

@jspenc72 And they ban your IP pretty quick too :P

roblav96 avatar Dec 30 '16 03:12 roblav96

@roblav96 Yes they do. Slow down the requests so they are only one or two per second and you'll be ok. You should be removed from the bank list after 48 hours.

jspenc72 avatar Dec 30 '16 04:12 jspenc72

Good call. Is there an integrated throttle/debounce in this lib?

roblav96 avatar Dec 30 '16 04:12 roblav96

Check out example 4 on the wiki

https://github.com/rchipka/node-osmosis/wiki#config

I think you may want to delay with set timeout and then call the done() method.

jspenc72 avatar Dec 30 '16 10:12 jspenc72

What does the delay method do though? From what I understand, it delays the next http call in the stack. Or does it delay the find method after calling get to allow js and DOM to load?

roblav96 avatar Dec 30 '16 18:12 roblav96

That is likely a viable option.

jspenc72 avatar Dec 30 '16 18:12 jspenc72

Try it.

jspenc72 avatar Dec 30 '16 18:12 jspenc72

Or does it delay the find method after calling get to allow js and DOM to load?

I don't think it does this, although I'd like for it to be able to. Does anyone know how to delay a find after a get call?

jimaldon avatar Feb 14 '17 16:02 jimaldon

@jimdsouza https://rchipka.github.io/node-osmosis/Command.html#delay

Delay each context before continuing down the chain.

Using .delay() after a call to .get() will delay the execution of the command that follows .delay().

rchipka avatar Feb 14 '17 16:02 rchipka