blast
blast copied to clipboard
Can I use Blast without modifying the Dom?
I've been looking all over for a library to split articles into paragraphs, sentences and words. As you know its not as easy as splitting at every '.'
Is there any way to use Blast to return an array of sentences without referencing the Dom elements?
OR
Is Blast based on some library logic somewhere?
Please and thank you for the assistance.
Hey, Rachel!
Good question. I'll look into this tomorrow when I push the new release of Blast. Stay tuned.
Hej, I would also be interested in using Blast on pure Strings instead of the DOM. Any easy way to do this?
Not at the moment. Thanks for reminding me of this. I plan on tackling this in a few weeks.
I also need this and I might be able to work on it, did you start any work already?
If you don't care about the DOM and only about extracting the sentences, there are two easy workarounds. The one I recommend:
var sentences = [];
var blasted = $("article").blast({ delimiter: "sentence", returnGenerated: true });
for (var key in blasted) {
// returnGenerated also returns properties that are not sentences with non-int keys
if (blasted.hasOwnProperty(key) && key == parseInt(key, 10)) {
sentences.push($(blasted[key]).text());
}
}
// Now `sentences` is an array of sentences
Second one. I think this is dirtier, but easier to understand:
$("article").blast({ delimiter: "sentence" });
var sentences = $('.blast').map(function(index, element){
return $(element).text();
}).get();
// Now `sentences` is an array of sentences
This seems to answer the original petition, "Is there any way to use Blast to return an array of sentences without referencing the Dom elements?", is that okay?
Thanks so much, @franciscop. I'll likely merge this in within 14 days. :)
Glad you like it, anything please ping me (although the change is quite small and trivial).
@racheldotey I had the same question as you, and ended up using parse-latin which parse text and generates an AST together with js-traverse which is a utility library that helps with finding data on an AST. Here is a code example:
const latin = (new require('parse-latin'))()
const traverse = require('traverse')
function latinParseAndSelect(text, type) {
const ast = latin.parse(text);
return traverse(ast).reduce(function (acc, x) {
if (x.type == type) acc.push(text.slice(x.position.start.offset, x.position.end.offset));
return acc;
}, []);
}
Then use like this:
> let text = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book."
undefined
> latinParseAndSelect(text, 'SentenceNode')
[ 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.',
'Lorem Ipsum has been the industry\'s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.' ]
> latinParseAndSelect(text, 'WordNode')
[ 'Lorem',
'Ipsum',
'is',
'simply',
'dummy',
'text',
'of',
'the',
'printing',
'and',
'typesetting',
'industry',
'Lorem',
'Ipsum',
'has',
'been',
'the',
'industry\'s',
'standard',
'dummy',
'text',
'ever',
'since',
'the',
'1500s',
'when',
'an',
'unknown',
'printer',
'took',
'a',
'galley',
'of',
'type',
'and',
'scrambled',
'it',
'to',
'make',
'a',
'type',
'specimen',
'book' ]
+1