4cat icon indicating copy to clipboard operation
4cat copied to clipboard

Custom stopword lists

Open oxygala opened this issue 2 years ago • 4 comments
trafficstars

Would you consider adding a feature that would enable users to use custom stopword lists (ie. in different languages) with tools that use them?

Thanks

oxygala avatar Mar 02 '23 14:03 oxygala

Hi @oxygala , yes, I can see how that would be a useful feature. There are a couple of ways to go about this - which way of providing the stopword lists to the processor(s) would be most convenient from your perspective?

stijn-uva avatar Mar 02 '23 17:03 stijn-uva

I think a simple upload box under every relevant processor that uses word lists would do.

oxygala avatar Mar 03 '23 14:03 oxygala

Both the 'Tokenise' and 'Filter by words or phrases' processors already allow providing custom word lists for filtering; 'Always delete this words' option for Tokenise and 'Custom word list' for the Filter. Is there a specific processor you would like to be able to do this with that I'm not thinking of?

stijn-uva avatar Mar 13 '23 14:03 stijn-uva

Yes, so does hatebase, though words separated by commas may not be the most practical list, if you are dealing with a long list. Being able to upload a .txt or .csv would be nice.

oxygala avatar Mar 28 '23 18:03 oxygala