Feature/dataTypes
With this Pull Request I would like to add new feature for the DataFrame project, dataTypes , which helps statisticians and data scientists perform some data type oriented analysis to any dataset.
In Statistics, Data Types play a very crucial and important role, which needs to be understood, to apply suitable statistical measurements to any data so that we can correctly conclude certain assumptions about the data.
The following methods have been added:
| Method Name | Description |
|---|---|
DataSeries calculateDataType |
Returns the type of a series of value |
DataFrame calculateDataTypes |
Identifies the type of each column in a DataFrame |
DataFrame dataTypes |
Returns a dictionary of column names and it's corresponding type |
DataFrame dataTypeOfColumn: aColumnName |
Returns the type of aColumnName |
DataFrame dataTypeOfColumnAt: anIndex |
Returns the type of column at given Index |
Apart from these methods , there's also dataTypeOfColumn: put: and dataTypeOfColumnAt: put: which enables us to modify the types. There's also calculateDataTypeOfColumn: and calculateDataTypeOfColumnAt: inorder to know the type of individual columns.
Consider the below example:
df := DataFrame withRows: #(
('Barcelona' 1.609 true nil 4 nil)
('Dubai' 2.789 true nil 5 1)
('London' 8.788 false nil 6 4.666)).
df columnNames: #( 'City' 'Population' 'BeenThere' 'Medals' 'Position' 'FinalScore').
df dataTypes. returns the following Dictionary.
| Key | Value |
|---|---|
| City | ByteString |
| Population | SmallFloat64 |
| BeenThere | Boolean |
| Medals | UndefinedObject |
| Position | SmallInteger |
| FinalScore | Object |
More brief examples could be found at: Introducing dataTypes