natural icon indicating copy to clipboard operation
natural copied to clipboard

What do you think about integrating node-snowball?

Open arminrosu opened this issue 8 years ago • 4 comments

Hey all,

I need a stemmer for romanian and I found node-snowball. What do you think about integrating it into natural?

Pros

  • support for multiple languages
  • separate module only for stemming

Cons

  • uses C library, can't be browserified
  • project is 2 years old, might be inactive, an october PR hasn't been merged

arminrosu avatar Mar 23 '16 12:03 arminrosu

well based on the number of issues we see on here re browserify it seems like theres a large number of people using natural that way so im not sure we'd be able to integrate something like that right now. Would totally lover to hear what everyone else thinks about it thought

kkoch986 avatar Mar 31 '16 02:03 kkoch986

What about this one? https://github.com/mazko/jssnowball Is it browserifiable?

namirsab avatar Dec 16 '16 13:12 namirsab

Or allow to set a stemmer with a particular API something like:

var natural = require('natural');

natural.setStemmer('en', nodeSnowball); // or whatever

If a stemmer is just a function that recieves a word and return a stem, that could be an option that would allow everybody to use Natural both server and client side, but would also allow those users that needs this only server side to use external stemmers for different languages.

namirsab avatar Dec 16 '16 14:12 namirsab

I've been looking into things, and it would be possible to fork the snowball code generator (https://github.com/snowballstem/snowball/blob/743b5af/compiler/generator_js.c#L1272) to emit modern js (or natural specific) stemmers. (personally, I also want to make an assemblyscript version)

Another option is to integrate https://github.com/MrRefactoring/multilingual-stemmer , which has the snowball stemmers compiled into wasm via Rust.

forivall avatar Nov 20 '19 19:11 forivall