PythonDataScienceHandbook
PythonDataScienceHandbook copied to clipboard
03.09 - Small issue with dtype
Dear all, first I thought, there was an issue using the conversion to int in the births example (that was an error on my side, see below), however, the following line seems incorrect or at least confusing:
Next we set the day column to integers; previously it had been a string because some columns in the dataset contained the value 'null':
As births['day']
is already a float:
births = pd.read_csv('data/births.csv')
print(births['day'].dtype)
float64
Admittedly, I am also a bit unhappy about how the "births" data is handled here. Going back and forth, playing with the data etc. especially after:
births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)')
is difficult, as a reloading of the data might be necessary. Furthermore, calling some of the functions twice seems to introduce NaNs somehow.
I'd think that using a copy of the "births" dataframe after the query operation might make this section a bit easier to interactively deal with.
Best, Simon