dime-python-training icon indicating copy to clipboard operation
dime-python-training copied to clipboard

feedback - session pandas May 13

Open kbjarkefur opened this issue 4 years ago • 10 comments

kbjarkefur avatar May 13 '21 13:05 kbjarkefur

when creating the first df manually, spend more time saying that this is a manual way that you typically do not use, but you want to start there to show the basics, and say that you will cover more things later

Maybe start with a slide that list ways you can create pandas, and then start with the manual way

kbjarkefur avatar May 13 '21 13:05 kbjarkefur

read_stata() - maybe make a note that not all meta info in .dta such as labels etc. are supported in pandas, so one can only expect that the data will be read correctly, but maybe not the meta data

kbjarkefur avatar May 13 '21 14:05 kbjarkefur

from @ccsuehara in chat:

Luckily, broken hearted people had made additional libraries to freely browse a dataframe as if it was stata! https://pypi.org/project/dtale/ for example this one

Mention this but maybe there is not time to cover it

kbjarkefur avatar May 13 '21 14:05 kbjarkefur

about the read_stata() as well, labels are read as strings and that might be troublesome! just give a heads up

ccsuehara avatar May 13 '21 14:05 ccsuehara

On the column label / column name slide, you tripped a little on the points you wanted to make. I think it was fine in the end, but there was room to add one more bullet point so maybe add one more point there

kbjarkefur avatar May 13 '21 14:05 kbjarkefur

slide 30 - missing iloc in example

kbjarkefur avatar May 13 '21 14:05 kbjarkefur

The .loc also indexes columns with their labels as well, for example crops.loc[:, ['Price]], basically we can subset anything with .loc

ccsuehara avatar May 13 '21 14:05 ccsuehara

maybe clarify that it needs () when defining more than 1 condition because of how the operator orders evaluate. Will expand more on this

ccsuehara avatar May 13 '21 14:05 ccsuehara

The most similar thing I've found for % and not repeating the df name is the .eval command https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html

ccsuehara avatar May 13 '21 14:05 ccsuehara

we can also have the index as variable with df['ind'] = df.index

ccsuehara avatar May 13 '21 15:05 ccsuehara