Data-Science-For-Beginners
Data-Science-For-Beginners copied to clipboard
Missing Height value in SOCR_MLB.tsv dataset
Missing height value in the SOCR_MLB.tsv dataset
The 'Height' column has a missing value in one of the rows
- To Reproduce:
- Go to the 04-stats-and-probability directory
- Open the notebook.ipynb
- Go to the Correlation and Evil Baseball Corp section
- Run the whole notebook
- From this section and below you will see a few nan values in the outputs.
-
Expected behavior Numerical outputs for correlation were expected but due to the missing value, nan ended up being the output for a few cells.
-
Screenshots
-
If you want to see the missing row just go to the Analyzing Real Data section and add this code
height_is_null = df['Height'].isnull()
-- Use boolean indexing to display rows where 'Height' is null rows_with_null_height = df[height_is_null]
--Print the resulting DataFrame print(rows_with_null_height)
below the first cell in that section (after reading and printing the dataset)
The output should look like this: 640 Kirk_Saarloos CIN Starting_Pitcher 72 NaN 27.77
NOTE that if you want to use the notebook which is located in the data directory, you can use the CTRL + F command and search for 'Kirk_Saarloos' to see the row with the missing value.