introduction-datascience-python-book icon indicating copy to clipboard operation
introduction-datascience-python-book copied to clipboard

Minor correction to chapter 3: logic error, outliers treatment

Open abrahambarrer opened this issue 1 year ago • 0 comments

In section 3.3.3 about Outliers Treatment it suggests that we can clean up values that exceed the median by 2 or 3 deviation standard:

df2 = df.drop( df.index[(df.income =='>50K\n') & 
       (df['age'] > df[’age’].median() + 35) & 
       (df['age'] > df[’age’].median() -15)
       ])

This boolean indexed is erroneous because it only cleans values that are more than 35 above the median. A correction might be changing operators > by <, and & by |:

df2 = df.drop( df.index[(df.income=='>50K\n') & 
       ((df['age'] > df['age].median() + 35) | (df['age'] < df['age].median() - 15))])

abrahambarrer avatar Jul 25 '24 17:07 abrahambarrer