could this be faster with Set instead of List

Open psynautic opened this issue 4 years ago • 1 comments

My colleague was working with this library for some NLP stuff, and he was trying to manipulate the CENSOR_WORDS for reasons not particularly important for this question.

It got me wondering, wouldn't this all go a lot faster if CENSOR_WORDS was a set(). Forgive me if I'm wasting your time, I didn't FULLY trace the code.

It seems to me that a lookup against a very large set of words or phrases would always be faster if you had a Set because it works as a hash table under the python covers.

Jan 26 '22 17:01 psynautic

You are right that using a list is far slower than using a set. I did this to solve the issue, given that you don't edit the censor list afterwards.

from better_profanity import varying_string
varying_string.VaryingString.__hash__ = lambda self : hash(self._original)
import better_profanity
# make your edits to the censor list here
better_profanity.profanity.CENSOR_WORDSET = frozenset(better_profanity.profanity.CENSOR_WORDSET)

If you want everything to work, you are going to need to make all uses of the CENSOR_WORDSET work with sets and not list. The code in the main file is only ~250 lines so it would be easy enough. Otherwise this gets the job done.

Jan 29 '22 05:01 DeathDragon7050