DataFrame icon indicating copy to clipboard operation
DataFrame copied to clipboard

Feature handling empty values

Open BalajiG2000 opened this issue 4 years ago • 1 comments

In real world data, there are some instances where a particular element is absent because of various reasons, such as, corrupt data, failure to load the information, or incomplete extraction. Handling those missing values is one of the greatest challenges faced by analysts, because making the right decision on how to handle it generates robust data models. This Pull request provides the ability to handle empty values to the DataFrame project. Consider the below example : df := DataFrame withRows: #( #( Barcelona 1.609 nil 3 ) #( nil nil true 4 ) #( London 8.788 false 1 ) #( Tokyo 5.785 nil 5 ) #( Beijing nil false 6 ) ). df rowNames: #( A B C D E ). df columnNames: #( City Population BeenThere Position ).

Methods like replaceNils: anObject , replaceNilsWithZero , replaceNilsWithMean , replaceNilsWithMedian , replaceNilsWithMode are self explanatory. Below are some examples for remaining methods.


df numberOfNils. Returns a Dictionary which shows the total number of Nil values in each column.

Key Value
#City 1
#Population 2
#BeenThere 2
#Position 0

df hasNilsByColumn. Returns a Dictionary which shows whether each column contains nil values.

Key Value
#City true
#Population true
#BeenThere true
#Position false

df hasNils.
returns true when a nil value is present anywhere in dataFrame, retrurns false otherwise.

df removeRowsWithNils. returns a modified dataFrame after removing all rows which had nils.

df replaceNilsWithPreviousRowValue. This will propagate last valid observation forward. Much similar to ffill() in Pandas.

BalajiG2000 avatar Aug 07 '21 08:08 BalajiG2000

This is a very useful addition, thanks! I left a few comments above, and then there's the CI failure that I don't quite understand, but overall this is very nice!

khinsen avatar Aug 09 '21 08:08 khinsen

CI failure not related. An alternative method to one added here have been added to Dataframe in the meantime. I’ll merge this PR and do another one to depreciate the other way in another PR.

jecisc avatar Feb 13 '23 16:02 jecisc