flashtext
flashtext copied to clipboard
Unable to find/replace '_' with ' ' even when removing '_' from non word boundaries
I'm currently processing a list of 100k+ texts. Regex is incredibly slow for this so I thought FlashText would be perfect. I'm unable to use FlashText to replace the '_' with ' '
`>>> text = 'the_quick_brown fox jumps over_the fence'
proc = KeywordProcessor() proc.non_word_boundaries.remove('_') proc.add_keyword('_', ' ') True proc.replace_keywords(text) 'the_quick_brown fox jumps over_the fence'
`
Just do simple text replace for this.
string_val = string_val.replace('_', '')
On Sun, Apr 7, 2019, 1:33 AM ezekielg [email protected] wrote:
I'm currently processing a list of 100k+ texts. Regex is incredibly slow for this so I thought FlashText would be perfect. I'm unable to use FlashText to replace the '_' with ' '
`>>> text = 'the_quick_brown fox jumps over_the fence'
proc = KeywordProcessor() proc.non_word_boundaries.remove(' ') proc.add_keyword('', ' ') True proc.replace_keywords(text) 'the_quick_brown fox jumps over_the fence'
`
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vi3k6i5/flashtext/issues/77, or mute the thread https://github.com/notifications/unsubscribe-auth/AC-Nws92clx1HfWES4Gyiu6PFBY25HtMks5veP2PgaJpZM4cgZBd .
is it possible to replace something with "" as in the question from a list of dictionary? would love to use your dictionary to remove certain keywords from a sentence.
string.replace() works but I have a big list of dictionary words to remove
I tried the same with the hopes that it would help me replace millions of lines that have "+" symbol . when I tried -
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('+', '')
keyword_processor.replace_keywords('I love Big Apple and + new Delhi.')
The result was 'I love Big Apple and + new Delhi.'
@reach4bawer
That's because technically keyword_processor.add_keyword('+', '')
means replace +
with None
. See:
bool("")
>> False
This means if we go down that rabbit hole and check out
def add_keyword(self, keyword, clean_name=None):
return self.__setitem__(keyword, clean_name)
we see that it uses:
def __setitem__(self, keyword, clean_name=None):
"""To add keyword to the dictionary
pass the keyword and the clean name it maps to.
Args:
keyword : string
keyword that you want to identify
clean_name : string
clean term for that keyword that you would want to get back in return or replace
if not provided, keyword will be used as the clean name also.
Examples:
>>> keyword_processor['Big Apple'] = 'New York'
"""
status = False
if not clean_name and keyword:
clean_name = keyword
[...]
This means that the only way you can replace a keyword properly is if clean_name
is not None
or bool(clean_name) == True
.
So in your case, you can either do:
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('+', " ") # note the whitespace
keyword_processor.replace_keywords('I love Big Apple and + new Delhi.')
>> 'I love Big Apple and new Delhi.'
Or just go with a simple .replace()
as @vi3k6i5 suggested in the first place
your_text = "I love Big Apple and + new Delhi."
your_text = your_text.replace('_', '')
print(your_text)
>> 'I love Big Apple and new Delhi.'
@iwpnd Thank you for the explanation. Will use the replace function.