gramophone
gramophone copied to clipboard
Including contents of script tag
Following the example, I issue this chained command:
http://bashmodernquantity.com/bash-modern-quantity/2014/1/10/wool-and-copper
request = require('request')
gramophone = require('gramophone')
request('http://bashmodernquantity.com/bash-modern-quantity/2014/1/10/wool-and-copper').pipe(gramophone.stream({html: true, limit: 10})).on('data', console.error.bind(console));
In the results, I'm clearing getting the contents of script tags from that page, consisting of variable names (even one-letter variable names).
Ah yes. Gramophone uses stripTags from underscore.string, which simply removes any tags and leaves only the enclosed text. As you spotted, Gramophone is then getting stuck on all the minified tokens in the script tag.
You might have to parse out the relevant bits of your webpage first before passing the html to gramophone. Something like:
var request = require('request')
var gramophone = require('gramophone')
var cheerio = require('cheerio')
var stream = gramophone.stream({html: true, limit: 10, score: true, stem: true});
stream.on('data', console.error.bind(console));
request('http://bashmodernquantity.com/bash-modern-quantity/2014/1/10/wool-and-copper', function(err, res, body) {
if (err) return console.error('err: ' + err );
var page = cheerio.load(body)('#page').html();
stream.end(page);
});