could this be faster with Set instead of List
My colleague was working with this library for some NLP stuff, and he was trying to manipulate the CENSOR_WORDS for reasons not particularly important for this question.
It got me wondering, wouldn't this all go a lot faster if CENSOR_WORDS was a set(). Forgive me if I'm wasting your time, I didn't FULLY trace the code.
It seems to me that a lookup against a very large set of words or phrases would always be faster if you had a Set because it works as a hash table under the python covers.
You are right that using a list is far slower than using a set. I did this to solve the issue, given that you don't edit the censor list afterwards.
from better_profanity import varying_string
varying_string.VaryingString.__hash__ = lambda self : hash(self._original)
import better_profanity
# make your edits to the censor list here
better_profanity.profanity.CENSOR_WORDSET = frozenset(better_profanity.profanity.CENSOR_WORDSET)
If you want everything to work, you are going to need to make all uses of the CENSOR_WORDSET work with sets and not list. The code in the main file is only ~250 lines so it would be easy enough. Otherwise this gets the job done.