gramophone icon indicating copy to clipboard operation
gramophone copied to clipboard

Including contents of script tag

Open thomasqbrady opened this issue 10 years ago • 1 comments

Following the example, I issue this chained command:

http://bashmodernquantity.com/bash-modern-quantity/2014/1/10/wool-and-copper

request = require('request')
gramophone = require('gramophone')
request('http://bashmodernquantity.com/bash-modern-quantity/2014/1/10/wool-and-copper').pipe(gramophone.stream({html: true, limit: 10})).on('data', console.error.bind(console));

In the results, I'm clearing getting the contents of script tags from that page, consisting of variable names (even one-letter variable names).

thomasqbrady avatar Jan 24 '14 04:01 thomasqbrady

Ah yes. Gramophone uses stripTags from underscore.string, which simply removes any tags and leaves only the enclosed text. As you spotted, Gramophone is then getting stuck on all the minified tokens in the script tag.

You might have to parse out the relevant bits of your webpage first before passing the html to gramophone. Something like:

var request = require('request')
var gramophone = require('gramophone')
var cheerio = require('cheerio')

var stream =  gramophone.stream({html: true, limit: 10, score: true, stem: true});
stream.on('data', console.error.bind(console));

request('http://bashmodernquantity.com/bash-modern-quantity/2014/1/10/wool-and-copper', function(err, res, body) {
  if (err) return console.error('err: ' + err );
  var page = cheerio.load(body)('#page').html();
  stream.end(page);
});

bxjx avatar Jan 24 '14 05:01 bxjx