node-unfluff icon indicating copy to clipboard operation
node-unfluff copied to clipboard

Convert to front-end friendly, remove 'fs'

Open knod opened this issue 8 years ago • 13 comments

This module could pretty easily be converted to be front end friendly by removing the need for 'fs'. Since the stopwords files are the only things being accessed with 'fs', this could be solved pretty easily.

I created my own solution, but I'm not sure it was particularly elegant. My current thought is to create an object with a key of the language code and a value of the stopwords array. Do you have other thoughts about more elegant solutions?

knod avatar Dec 11 '16 12:12 knod

Supporting front-end use cases just hasn't ever come up before. I'd be worried that the stopwords data would be too large to be used in most front-end applications if it was bundled in.

ageitgey avatar Dec 14 '16 22:12 ageitgey

I'll admit I haven't tested it on a throttled system. It did work for me on a MacBook Pro, OSX 10.11, Chrome 55.0.2883.75 with no problems. Used it in the development of a Chrome extension.

The lists aren't actually that big. What I did at the time was turn those stopwords files into js files/node modules and require them in an object. I don't think a plain old object would be any more of a problem than that. We just need to test it on a slower system and see if it becomes an issue.

knod avatar Dec 15 '16 02:12 knod

I also was looking at node-unfluff for front end bro purposes. The solution offered by @knod works, an alternative could also be to utilize local HTML5 File IO when applicable, and fallback to fs when the browser does not support HTML5. This would leave the current structure untouched, and front-end users could work around that limitation in WebPack by setting up their configuration to supply an empty module for fs, bearing in mind that the extension would not work on browsers that don't support HTML5. But the list of users not on HTML5-compatible browsers is shrinking daily.

I'd be happy to code this and submit a PR, but I also wonder if this would be better suited to its own fork and module. A separate module via npm install unfluff-standalone or something similar.

aknobloch avatar Feb 15 '18 20:02 aknobloch

If it's a feature you want, feel free to either submit a PR or make a fork. I'm happy with whichever route works best for you. I just don't have the need to use this in a browser myself, so it's not something I ever attempted to support.

ageitgey avatar Feb 15 '18 21:02 ageitgey

Makes sense - but I notice there's a few PR's that address this issue while maintaining the robustness of the original design. However, these have been there for a while with little activity, so I'm wondering if there is a reason why they aren't being merged. The ideal solution would be to have this base updated with a solution so that separate maintenance on the stopwords files doesn't have to be done.

Thanks for the speedy reply, by the way!

aknobloch avatar Feb 15 '18 21:02 aknobloch

so I'm wondering if there is a reason why they aren't being merged.

A fair question! I don't have a good answer beyond I thought the existing PRs made the code slightly worse for me and I didn't need the new feature, so I did a bad job of ever getting around to doing anything with it :)

The ideal solution would be to have this base updated with a solution so that separate maintenance on the stopwords files doesn't have to be done.

Sure, that makes sense.

My issues with the current PRs are just that they required loading all stop words for all languages into memory even if when 99.9% of users would only need them for one language. Loading all languages not only would that make memory usage higher on the server side (100KB per process), but it seemed like a pretty bad solution for delivering the library to users client side. Is it really worth adding 100KB of useless data to to every page load for ever user for no reason?

I imagine there might be a more clever solution where you pre-generate a separate js bundle file per language or something like that. I just never looked into it.

But that's the honest answer - I didn't love the current solutions but I also didn't spend any time coming up with a better answer either. But I'm totally open to discussing it or revisiting my opinion or whatever.

ageitgey avatar Feb 15 '18 21:02 ageitgey

Gotcha, that makes perfect sense. And thanks for the explanation. I'll give the problem some more thought and submit a PR request if I come up with a solution that I think works well.

aknobloch avatar Feb 15 '18 21:02 aknobloch

Cool, thanks!

ageitgey avatar Feb 15 '18 22:02 ageitgey

Oh, cool. I didn't know that that was the problem with the PRs. That's good to know.

knod avatar Feb 23 '18 21:02 knod

I have made a solution which uses require for getting the stopword files. When combined with webpack and raw-loader you can make the download of the actual stopword files completely async and dependent on the relevant language of your content. If you would like to check it out, check my fork.

cjanietz avatar Apr 01 '18 21:04 cjanietz

Hopefully your PR gets accepted because I too would like this feature

bhowmikp avatar Apr 04 '18 05:04 bhowmikp

@cjanietz I really like your approach. I created a PR here: https://github.com/ageitgey/node-unfluff/pull/84

Let me know if you are OK with me pulling that in.

ageitgey avatar Apr 04 '18 19:04 ageitgey

I just tried to use this in an Electron app, which uses webpack to build the UI and non-UI parts of the app. It doesn't seem to support bundling txt files like along with the modules (as far as I can figure out). So if these were just hardcoded into js modules it would solve it there as well.

justinmchase avatar Nov 27 '21 23:11 justinmchase