node-osmosis
node-osmosis copied to clipboard
Craigs list Example Doesn't work
The current example doesn't work since crags list changed the structure of their page. The following will work:
var osmosis = require('osmosis');
osmosis
.get('www.craigslist.org/about/sites')
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + nav + div + div li > a')
.set('category')
.follow('@href')
.paginate('.totallink + a.button.next:first')
.find('p > a')
.follow('@href')
.set({
'title': 'section > h2',
'description': '#postingbody',
'subcategory': 'div.breadbox > span[4]',
'date': 'time@datetime',
'latitude': '#map@data-latitude',
'longitude': '#map@data-longitude',
'images': ['img@src'],
'id': 'p.postinginfo[0]'
})
.data(function(listing) {
// do something with listing data
console.log(listing);
})
.log(console.log)
.error(console.log)
.debug(console.log)
@jspenc72 And they ban your IP pretty quick too :P
@roblav96 Yes they do. Slow down the requests so they are only one or two per second and you'll be ok. You should be removed from the bank list after 48 hours.
Good call. Is there an integrated throttle/debounce in this lib?
Check out example 4 on the wiki
https://github.com/rchipka/node-osmosis/wiki#config
I think you may want to delay with set timeout and then call the done() method.
What does the delay
method do though? From what I understand, it delays the next http call in the stack. Or does it delay the find
method after calling get
to allow js and DOM to load?
That is likely a viable option.
Try it.
Or does it delay the find method after calling get to allow js and DOM to load?
I don't think it does this, although I'd like for it to be able to. Does anyone know how to delay a find after a get call?
@jimdsouza https://rchipka.github.io/node-osmosis/Command.html#delay
Delay each context before continuing down the chain.
Using .delay()
after a call to .get()
will delay the execution of the command that follows .delay()
.