python-ecology-lesson
python-ecology-lesson copied to clipboard
Clarifying challenge and removing unnecessary step
Setting sex to x is not used in the next step of the challenge.
When I took the course, challenge 1 in the "Challenge - Putting it all together" section of the lesson 03-index-slice-subset was a bit confusing.
- Create a new DataFrame that only contains observations with sex values that are not female or male. Assign each sex value in the new DataFrame to a new value of 'x'. Determine the number of null values in the subset.
The Lesson guide shows this solution:
new = surveys_df[~surveys_df['sex'].isin(['M', 'F'])].copy()
new['sex']='x'
print(len(new))
This returns: 2511, which is the same as:
sum(surveys_df['sex'].isnull())
However, as written in the lesson guide, setting 'sex' to 'x' serves no purpose because the len() value is for the whole new DataFrame, 'sex' is not used to count anything. Setting the value of the 'sex' column to x, then asking about the null values in the DataFrame was confusing because it could be asking for the null values in 'sex' only:
print(len(new[pd.isnull(new['sex'])]['sex']))
which returns 0, or in all the columns of the new DataFrame:
print(len(new[pd.isnull(new).any(axis=1)]))
which returns 2449.
With the proposed edit in the lesson, the lesson guide could be:
new = surveys_df[~surveys_df['sex'].isin(['M', 'F'])].copy()
#Print number of rows in the new DataFrame
new_no_rows = len(new)
print(new_no_rows)
2511
#How many rows in surveys_df had null values in sex
surveys_df_sexnull = sum(surveys_df['sex'].isnull())
#Compare
new_no_rows == surveys_df_sexnull
True
Hope this makes sense, or maybe I missed something?
Sorry I'm so late getting back on this - busy month. I like these edits, and I think they make the challenge clearer. Happy if you are, @maxim-belkin