Data-Science-For-Beginners icon indicating copy to clipboard operation
Data-Science-For-Beginners copied to clipboard

Missing Height value in SOCR_MLB.tsv dataset

Open Rayycoding opened this issue 1 year ago • 0 comments

Missing height value in the SOCR_MLB.tsv dataset

The 'Height' column has a missing value in one of the rows

  • To Reproduce:
  1. Go to the 04-stats-and-probability directory
  2. Open the notebook.ipynb
  3. Go to the Correlation and Evil Baseball Corp section
  4. Run the whole notebook
  5. From this section and below you will see a few nan values in the outputs.
  • Expected behavior Numerical outputs for correlation were expected but due to the missing value, nan ended up being the output for a few cells.

  • Screenshots image image image

  • If you want to see the missing row just go to the Analyzing Real Data section and add this code

height_is_null = df['Height'].isnull()

-- Use boolean indexing to display rows where 'Height' is null rows_with_null_height = df[height_is_null]

--Print the resulting DataFrame print(rows_with_null_height)

below the first cell in that section (after reading and printing the dataset)

The output should look like this: 640 Kirk_Saarloos CIN Starting_Pitcher 72 NaN 27.77

NOTE that if you want to use the notebook which is located in the data directory, you can use the CTRL + F command and search for 'Kirk_Saarloos' to see the row with the missing value.

Rayycoding avatar Oct 14 '23 09:10 Rayycoding