blast icon indicating copy to clipboard operation
blast copied to clipboard

Can I use Blast without modifying the Dom?

Open ghost opened this issue 10 years ago • 9 comments

I've been looking all over for a library to split articles into paragraphs, sentences and words. As you know its not as easy as splitting at every '.'

Is there any way to use Blast to return an array of sentences without referencing the Dom elements?

OR

Is Blast based on some library logic somewhere?

Please and thank you for the assistance.

ghost avatar Dec 27 '14 01:12 ghost

Hey, Rachel!

Good question. I'll look into this tomorrow when I push the new release of Blast. Stay tuned.

julianshapiro avatar Jan 04 '15 02:01 julianshapiro

Hej, I would also be interested in using Blast on pure Strings instead of the DOM. Any easy way to do this?

mphasize avatar Feb 16 '15 13:02 mphasize

Not at the moment. Thanks for reminding me of this. I plan on tackling this in a few weeks.

julianshapiro avatar Feb 25 '15 04:02 julianshapiro

I also need this and I might be able to work on it, did you start any work already?

franciscop avatar May 01 '15 05:05 franciscop

If you don't care about the DOM and only about extracting the sentences, there are two easy workarounds. The one I recommend:

var sentences = [];
var blasted = $("article").blast({ delimiter: "sentence", returnGenerated: true });
for (var key in blasted) {
  // returnGenerated also returns properties that are not sentences with non-int keys
  if (blasted.hasOwnProperty(key) && key == parseInt(key, 10)) {
    sentences.push($(blasted[key]).text());
    }
  }
// Now `sentences` is an array of sentences

Second one. I think this is dirtier, but easier to understand:

$("article").blast({ delimiter: "sentence" });
var sentences = $('.blast').map(function(index, element){
  return $(element).text();
  }).get();
// Now `sentences` is an array of sentences

This seems to answer the original petition, "Is there any way to use Blast to return an array of sentences without referencing the Dom elements?", is that okay?

franciscop avatar May 01 '15 06:05 franciscop

Thanks so much, @franciscop. I'll likely merge this in within 14 days. :)

julianshapiro avatar May 01 '15 20:05 julianshapiro

Glad you like it, anything please ping me (although the change is quite small and trivial).

franciscop avatar May 07 '15 10:05 franciscop

@racheldotey I had the same question as you, and ended up using parse-latin which parse text and generates an AST together with js-traverse which is a utility library that helps with finding data on an AST. Here is a code example:

const latin = (new require('parse-latin'))()
const traverse = require('traverse')

function latinParseAndSelect(text, type) {
  const ast = latin.parse(text);
  return traverse(ast).reduce(function (acc, x) {
    if (x.type == type) acc.push(text.slice(x.position.start.offset, x.position.end.offset));
    return acc;
  }, []);
}

Then use like this:

> let text = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book."
undefined
> latinParseAndSelect(text, 'SentenceNode')
[ 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.',
  'Lorem Ipsum has been the industry\'s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.' ]
> latinParseAndSelect(text, 'WordNode')
[ 'Lorem',
  'Ipsum',
  'is',
  'simply',
  'dummy',
  'text',
  'of',
  'the',
  'printing',
  'and',
  'typesetting',
  'industry',
  'Lorem',
  'Ipsum',
  'has',
  'been',
  'the',
  'industry\'s',
  'standard',
  'dummy',
  'text',
  'ever',
  'since',
  'the',
  '1500s',
  'when',
  'an',
  'unknown',
  'printer',
  'took',
  'a',
  'galley',
  'of',
  'type',
  'and',
  'scrambled',
  'it',
  'to',
  'make',
  'a',
  'type',
  'specimen',
  'book' ]

tarikjn avatar Apr 08 '16 17:04 tarikjn

+1

serpulga avatar May 14 '17 20:05 serpulga