flashtext icon indicating copy to clipboard operation
flashtext copied to clipboard

Unable to find/replace '_' with ' ' even when removing '_' from non word boundaries

Open ezekielg opened this issue 5 years ago • 5 comments

I'm currently processing a list of 100k+ texts. Regex is incredibly slow for this so I thought FlashText would be perfect. I'm unable to use FlashText to replace the '_' with ' '

`>>> text = 'the_quick_brown fox jumps over_the fence'

proc = KeywordProcessor() proc.non_word_boundaries.remove('_') proc.add_keyword('_', ' ') True proc.replace_keywords(text) 'the_quick_brown fox jumps over_the fence'

`

ezekielg avatar Apr 06 '19 20:04 ezekielg

Just do simple text replace for this.

string_val = string_val.replace('_', '')

On Sun, Apr 7, 2019, 1:33 AM ezekielg [email protected] wrote:

I'm currently processing a list of 100k+ texts. Regex is incredibly slow for this so I thought FlashText would be perfect. I'm unable to use FlashText to replace the '_' with ' '

`>>> text = 'the_quick_brown fox jumps over_the fence'

proc = KeywordProcessor() proc.non_word_boundaries.remove(' ') proc.add_keyword('', ' ') True proc.replace_keywords(text) 'the_quick_brown fox jumps over_the fence'

`

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vi3k6i5/flashtext/issues/77, or mute the thread https://github.com/notifications/unsubscribe-auth/AC-Nws92clx1HfWES4Gyiu6PFBY25HtMks5veP2PgaJpZM4cgZBd .

vi3k6i5 avatar Apr 06 '19 20:04 vi3k6i5

is it possible to replace something with "" as in the question from a list of dictionary? would love to use your dictionary to remove certain keywords from a sentence.

string.replace() works but I have a big list of dictionary words to remove

brlala avatar Apr 30 '19 08:04 brlala

I tried the same with the hopes that it would help me replace millions of lines that have "+" symbol . when I tried -

keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('+', '')
keyword_processor.replace_keywords('I love Big Apple and + new Delhi.')

The result was 'I love Big Apple and + new Delhi.'

reach4bawer avatar Feb 17 '20 03:02 reach4bawer

@reach4bawer

That's because technically keyword_processor.add_keyword('+', '') means replace + with None. See:

bool("")
>> False

This means if we go down that rabbit hole and check out

    def add_keyword(self, keyword, clean_name=None):
        
        return self.__setitem__(keyword, clean_name)

we see that it uses:

    def __setitem__(self, keyword, clean_name=None):
        """To add keyword to the dictionary
        pass the keyword and the clean name it maps to.
        Args:
            keyword : string
                keyword that you want to identify
            clean_name : string
                clean term for that keyword that you would want to get back in return or replace
                if not provided, keyword will be used as the clean name also.
        Examples:
            >>> keyword_processor['Big Apple'] = 'New York'
        """
        status = False
        if not clean_name and keyword:
            clean_name = keyword
            [...]

This means that the only way you can replace a keyword properly is if clean_name is not None or bool(clean_name) == True.

So in your case, you can either do:

keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('+', " ") # note the whitespace
keyword_processor.replace_keywords('I love Big Apple and + new Delhi.')
>> 'I love Big Apple and  new Delhi.'

Or just go with a simple .replace() as @vi3k6i5 suggested in the first place

your_text = "I love Big Apple and + new Delhi."
your_text = your_text.replace('_', '')
print(your_text)
>> 'I love Big Apple and  new Delhi.'

iwpnd avatar Feb 17 '20 14:02 iwpnd

@iwpnd Thank you for the explanation. Will use the replace function.

reach4bawer avatar Feb 23 '20 05:02 reach4bawer