machine-learning-book icon indicating copy to clipboard operation
machine-learning-book copied to clipboard

Ch 8, applying preprocessor to dataframe type-error

Open DanyaLearning opened this issue 1 year ago • 1 comments

df['review'] = df['review'].apply(preprocessor)

Leads to a type-error: TypeError: expected string or bytes-like object, got 'float'

With small modification, I got the code to work, by making sure the text is really a string by enforcing it with str(). Not sure if this is the proper way of doing it, but it works for me.

Here is the modified preprocessor that is executed without errors:

def preprocessor(text):`
    text = re.sub('<[^>]*>', '', str(text)) #here I use the str() function
    emoticons = re.findall(r'(?::|;|=)(?:-)?(?:\)|\(|D|P)',
                           text)
    text = (re.sub(r'[\W]+', ' ', text.lower()) +
            ' '.join(emoticons).replace('-', ''))
    return text

DanyaLearning avatar Jan 09 '25 14:01 DanyaLearning

Thanks for the note, and sorry for the late follow-up! I do think your modification would correctly fix issues where the "review" is a float. But I think there may be a bigger issue here: "review" shouldn't be a float. I wonder if there was perhaps an issue with the data frame loading before that line.

I would maybe do a df.head() and a df.tail() to just double-check that it looks as expected:

Image

rasbt avatar Jan 30 '25 23:01 rasbt