natural
natural copied to clipboard
What do you think about integrating node-snowball?
Hey all,
I need a stemmer for romanian and I found node-snowball. What do you think about integrating it into natural?
Pros
- support for multiple languages
- separate module only for stemming
Cons
- uses C library, can't be browserified
- project is 2 years old, might be inactive, an october PR hasn't been merged
well based on the number of issues we see on here re browserify it seems like theres a large number of people using natural that way so im not sure we'd be able to integrate something like that right now. Would totally lover to hear what everyone else thinks about it thought
What about this one? https://github.com/mazko/jssnowball Is it browserifiable?
Or allow to set a stemmer with a particular API something like:
var natural = require('natural');
natural.setStemmer('en', nodeSnowball); // or whatever
If a stemmer is just a function that recieves a word and return a stem, that could be an option that would allow everybody to use Natural both server and client side, but would also allow those users that needs this only server side to use external stemmers for different languages.
I've been looking into things, and it would be possible to fork the snowball code generator (https://github.com/snowballstem/snowball/blob/743b5af/compiler/generator_js.c#L1272) to emit modern js (or natural
specific) stemmers. (personally, I also want to make an assemblyscript version)
Another option is to integrate https://github.com/MrRefactoring/multilingual-stemmer , which has the snowball stemmers compiled into wasm via Rust.