r-novice-inflammation icon indicating copy to clipboard operation
r-novice-inflammation copied to clipboard

For loops and conditions vs. apply and logical subsetting

Open dmi3kno opened this issue 7 years ago • 10 comments

I am of strong opinion that introducing for loops and if else statements early in the teaching program makes more harm than good to R education. I understand that this is done out of desire to maintain consistency between how Python and R are taught, but I would argue that the approach to teaching (and using) the two languages should be different.

I would argue that for loops and if else statements need to be shelved under the "Advanced" topics towards the end of the teaching lesson and instead a sections on *apply family of functions and logical subsetting should be introduced. R is positioning itself as a vectorized language (even though under the hood it might be running highly optimized for loops in C++), but the R programmer is encouraged to think in terms of vectors and data frames. Introduction of non-vectorized operators breaks that frame and positions R for failure due to apriori lost argument on speed and efficiency.

Again, if you agree, I volunteer to handpick material on *apply operators from another lesson (I will not introduce any new concepts but rather repackages what is already in the very good Software Carpentry curriculum) and rework logical subsetting (and, perhaps, mention vectorized ifelse() function) to cater for the need to teach implementation of conditional logic (branching) in R.

As I said, the loop and branching sections are good, but only as an advanced topic towards the end of the lesson material.

dmi3kno avatar Nov 23 '16 13:11 dmi3kno

I agree.

RMHogervorst avatar Nov 25 '16 16:11 RMHogervorst

I agree that Python and R are different languages with different domains and should be treated accordingly.

My only concern is if we remove for loops until the very end of the lesson, then the entire lesson will need to be restructured. The current lesson gets users to load data, create a plot, then automatically create multiple plots from multiple datasets. This requires a for loop (unless I am mistaken).

The rationale behind the current lesson order is that for loops and conditionals are fundamental in programming (in other languages). Since this is the programming with r lesson and not a data analysis in r lesson, the entire lesson is more focused on programming concepts, rather than data analysis concepts.

I am more than happy to continue this discussion, as it does set a foundation for our students.

chendaniely avatar Mar 27 '17 19:03 chendaniely

I guess you could use a apply action in stead of a for loop, but that is just a for loop in hiding. Since that does not really make a difference but would introduce new functions . So that doesn't help in the software carpentry lessons. I agree with the programming with r vs data analyses in r rationale.

RMHogervorst avatar Mar 28 '17 06:03 RMHogervorst

My argument, then, is that there's no such thing as "beginner programmer in R". There's only "beginner analyst in R". It is very rare instance when for-loops need to be written and those shall be reserved to non-rectangular data types. For everything else R has an awesome functional programming toolbox with base::*apply and purrr::map_* families which (although rely on C++ for loops) emphasize the functional aspect of it and hide away the implementation details (which do more harm than good to beginners). This is highly philosophical discussion and I am ready to give in on changing the lesson, if you guys confirm that you taught R with for loops and you tried introducing apply instead and you liked the former better.

dmi3kno avatar Mar 28 '17 07:03 dmi3kno

Hello! Having myself been enchanted by purrr::map() last year, how about a …-suppl-….Rmd that compares and contrasts it and base::apply?

katrinleinweber avatar Feb 01 '18 14:02 katrinleinweber

Related to #276, because both readr & purrr are in the tidyverse.

katrinleinweber avatar Mar 27 '18 08:03 katrinleinweber

R is evolving so fast that I no longer want to stand by base::apply(). It should be purrr::map() all the way. We tried teaching it in SWC Oslo and it works like a charm. Highly recommend watching Hadley's cupcake rant video: https://www.youtube.com/watch?v=GyNqlOjhPCQ

Also, plenty of resources for teaching purrr, not least by Jenny Bryan

dmi3kno avatar Mar 27 '18 08:03 dmi3kno

I got another comment offline about this and am absolutely convinced we should rewrite 03-loops-R.Rmd (and drop 15-supp-loops-in-depth.Rmd, or merge into into the former).

Contributions welcome! Some inspiration thanks to @jennybc: Thinking inside the box (45min webinar).

katrinleinweber avatar Jun 30 '18 14:06 katrinleinweber

I've only been coding for about 6 months now, but here are my 2 cents: I think for loops (or the apply/map functions) should be taught early in R. My first R project after that involved working with over 100 CSV files. From the data carpentry lesson I had taken, I knew how to work with one CSV at a time, but I had to spend a lot of time on the internet to figure out how to use the apply function before I could make much progress on the project.

I guess what I'm trying to say is that for loops are an important functional tool that programmers need to have at their fingertips, and I think it should be taught early on.

CodeRThane avatar Feb 02 '21 04:02 CodeRThane

The argument made by @dmi3kno makes sense to me as well. @CodeRThane, the idiomatic R method is to use purrr::map() or apply instead of for loops as you noticed. I'd be happy to see some concrete PRs for this issue.

HaoZeke avatar Feb 02 '21 09:02 HaoZeke