4cat
4cat copied to clipboard
Custom stopword lists
Would you consider adding a feature that would enable users to use custom stopword lists (ie. in different languages) with tools that use them?
Thanks
Hi @oxygala , yes, I can see how that would be a useful feature. There are a couple of ways to go about this - which way of providing the stopword lists to the processor(s) would be most convenient from your perspective?
I think a simple upload box under every relevant processor that uses word lists would do.
Both the 'Tokenise' and 'Filter by words or phrases' processors already allow providing custom word lists for filtering; 'Always delete this words' option for Tokenise and 'Custom word list' for the Filter. Is there a specific processor you would like to be able to do this with that I'm not thinking of?
Yes, so does hatebase, though words separated by commas may not be the most practical list, if you are dealing with a long list. Being able to upload a .txt or .csv would be nice.