python-ecology-lesson Avoid confusing, potentially ambiguous commands for slicing/indexing data frames

Avoid confusing, potentially ambiguous commands for slicing/indexing data frames

Open caesoma opened this issue 5 years ago • 4 comments

In episode 3 (https://datacarpentry.org/python-ecology-lesson/03-index-slice-subset/index.html, actually listed as 4. in https://datacarpentry.org/python-ecology-lesson/ ), the distinction between .iloc method for accessing entries by position and .loc to access them by identifier is made, but a third possibility is shown with surveys_df[0:3], which accesses the indices by position.

That command is redundant with surveys_df.iloc[0:3] and is similar to accessing a column, i.e. df["column_name"], and can be mistaken for selecting a column if those are numbers. On top of that something using row and column positions like df[0:2,1] will raise an error.

While the command could be useful and best practices could avoid mistaking row/column identifiers, the lesson could instead say that df["col_name"] or df["list", "of", "col_names"] will access columns, while df.loc["index"] will access rows. That will keep position and identifier-based selection as separate commands for beginners.

# example
import pandas
from numpy.random import randint

arr = randint(0,10, [3,3]) 
df = pandas.DataFrame(arr)

df[0]  # selects first column
df[0:1]  # selects first row

Feb 07 '20 22:02 caesoma

Hi, @caesoma! Apologies for taking so long to respond.

Very good and valid point! I think the best solution would be to make learners aware of this in a form of an exercise or an additional material. Would you be willing to make this contribution to the episode?

Jun 07 '20 21:06 maxim-belkin

Hi, sure, I can do that. Let me know what format this exercise should be in.

Jun 24 '20 03:06 caesoma

Could you please draft a PR modifying existing and adding new text and/or exercise? we could then discuss the details such as format, etc. And please let me know if you need any help along the way.

Jun 25 '20 02:06 maxim-belkin

Sorry for the long delay as well. Finally got around to making the proposed changes.

Jan 08 '21 03:01 caesoma

I'm closing this issue as we worked through and accepted the relevant PR back in April.

May 25 '23 08:05 LilithElina

python-ecology-lesson python-ecology-lesson copied to clipboard

Avoid confusing, potentially ambiguous commands for slicing/indexing data frames

python-ecology-lesson
python-ecology-lesson copied to clipboard