PythonDataScienceHandbook 03.09 - Small issue with dtype

03.09 - Small issue with dtype

Open SRSteinkamp opened this issue 5 years ago • 0 comments

Dear all, first I thought, there was an issue using the conversion to int in the births example (that was an error on my side, see below), however, the following line seems incorrect or at least confusing:

Next we set the day column to integers; previously it had been a string because some columns in the dataset contained the value 'null':

As births['day'] is already a float: births = pd.read_csv('data/births.csv') print(births['day'].dtype)

float64

Admittedly, I am also a bit unhappy about how the "births" data is handled here. Going back and forth, playing with the data etc. especially after: births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') is difficult, as a reloading of the data might be necessary. Furthermore, calling some of the functions twice seems to introduce NaNs somehow. I'd think that using a copy of the "births" dataframe after the query operation might make this section a bit easier to interactively deal with.

Best, Simon

Jul 11 '19 07:07 SRSteinkamp

PythonDataScienceHandbook PythonDataScienceHandbook copied to clipboard

03.09 - Small issue with dtype

PythonDataScienceHandbook
PythonDataScienceHandbook copied to clipboard