R-ecology-lesson icon indicating copy to clipboard operation
R-ecology-lesson copied to clipboard

Improve the narrative in the ggplot2 lesson

Open fmichonneau opened this issue 8 years ago • 14 comments

We present 3 types of plots in the ggplot lesson:

  • scatterplots
  • boxplots
  • time series

However, it feels that this lesson could be made more interactive and the 3 types of plot presented don't seem well justified and/or included in a narrative.

fmichonneau avatar Jun 10 '16 20:06 fmichonneau

Hi. At the risk of asking a naive question as an instructor in training, has there been a much progress on this since ? I'd be happy to help pitch in and take a stab at justifying the plots in a more example based way. If there are folks who have already begun this process, I'd be happy to coordinate and add in where it would be most helpful.

For example I was thinking that the simple statement of why you would want to perform the scatterplot visualization in the Challenge section could be moved to an earlier part of the lesson. Potentially stating why you'd be interested in seeing your data with each plot type. Something like the first thing you want to do with a dataset is take a look at it and see what immediately jumps out at you, often using a scatterplot. You want to summarize the data distributions with boxplots and so on. This exploring the data framework applied to the section then would set up the iterative plot construction, adding colors, splitting into facets, etc. If this isn't useful, feel free to point me in a different direction. If it's helpful, I can take a stab at re-working the text.

joshsteele avatar Dec 08 '17 08:12 joshsteele

Hi @joshsteele Thanks for your comment. Yes there is still room for improvement and not much progress done on this (see also #271). What you outline sounds good and would be a great start. Ideally, what I'd like to see in this chapter is something that would reflect more closely how a researcher would use visualization to:

  1. explore the dataset (I think what you mention)
  2. create a visualization to highlight a pattern that can be explained by a scientific hypothesis.

In other words, you are on the right track and we would welcome your contribution!

fmichonneau avatar Dec 08 '17 21:12 fmichonneau

Hi @fmichonneau Glad to hear that I'm thinking in a similar direction to you folks. I will begin working on this this week and I will reach out to you with questions as they arise.

joshsteele avatar Dec 12 '17 00:12 joshsteele

What is the current status for this issue? I am looking for options for making my first contribution as an instructor in training, and this feels like something I could contribute to. Any specific directions on what are the priorities for improvement? Thank you.

thiagosfsilva avatar Apr 28 '18 16:04 thiagosfsilva

hi @thiagosfsilva I have started a new ggplot2 lesson in the "tidyverse-first" branch. It's very bare bone at the moment, so any contributions on this would be welcome.

Note:

  1. you'll need this version of the ratdat package
  2. this will be the first episode of the lesson

fmichonneau avatar Apr 29 '18 15:04 fmichonneau

Hi, I'm another instructor-in-training looking to make a contribution.

In the customization section, I am wondering if it might be better to show how to use scale_x_continuous(name = "", limits=c()) than to teach xlab/xlim, since the same class of functions can then be used to change legends (using scale_color/etc), and to do a lot of other customization.

I could draft some text for this section in the new lesson, if that would be helpful.

cnoecker avatar Jun 08 '18 20:06 cnoecker

@fmichonneau I'm quite keen on contributing some materials to the "tidyverse-first" branch, as I find that current narrative of the course could be improved (by addressing this issue alongside #194 and #378 ) and it seems like this new branch is going that way. Is there an overall plan for it?

I don't know if this helps, but one possible narrative:

  • "Intro to R"
    • Keep pretty much as is.
  • "starting with data"
    • use readr::read_csv, which will mean the section on factors disappears (what a relief that would be... this is always so abstract and exotic for students, we can introduce factors later in plotting when we want to switch order of labels, for example)
    • Instead of factors, we could use the rest of that session to explore missing data, which I don't think we emphasise enough in the current course. The packages visdat and naniar seem to fit very well with the general spirit of these lessons and make it relatively intuitive and fun.
  • "Data visualisation" - before manipulation
    • As mentioned above, make this kind of what a researcher would do - ask questions and visualise, ask more questions, more visualisation, and repeat...
    • Finalise the session by admitting that we are a bit limited, as we would ideally like to filter our data, summarise in different ways, and so on
    • I don't think we need to spend too much time on customisation, apart from saying to students that every single thing in the plot is changeable and direct them to the right resources (or have some examples in the materials that are not covered interactively). A couple of exceptions might be axis labels and maybe adding one of the pre-built themes just to show how easily the overall appearance can be changed.
  • "Data manipulation"
    • Now we go into dplyr land, but always anchored with visualisation. So, for example, we can do some filtering of data, and then visualise it, reinforcing the knowledge from the previous lesson
    • Introduce pipes, and show how they can also be integrated with ggplot2 for interactive data exploration
    • Summarise data, and again visualise the result
    • Gather and spread data - motivate the problem by asking: "What if we want to look at the correlation of weights between two genera?". Then we need to spread the data, so it's immediately clear why we would want to do something like that.
    • Introduce factors, as something useful when we want to change the order of labels in the graph - e.g. if we want "M" before "F" when facetting/colouring by sex

Sorry, this was a bit long in the end. Does this make sense, is it OK to start some contributions along these lines?

tavareshugo avatar Jul 03 '18 19:07 tavareshugo

@tavareshugo we started doing some work on this in the tidyverse-first branch which is rendered at https://dc-r-ecology-dev.netlify.com/

Very early stages, but we welcome feedback, ideas, and contributions there!

fmichonneau avatar Jul 03 '18 21:07 fmichonneau

I think that instead of talking about 'how to make a ____ plot", we should talk more about plotting types of data against each other (ie, numerical vs numerical, categorical vs. numerical) and what geoms suit each situation.

I'd be willing to submit a pull request on the current lesson. Or is the switch to tidyverse first being made?

maglet avatar Dec 10 '18 19:12 maglet

Regarding the plots, I taught the Data Analysis and Visualisation section a few weeks ago and one thing I found that helped students was to first describe what boxplots are, which is missing the current material. Here is an image I used to explain boxplots: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

ac812 avatar Apr 30 '19 12:04 ac812

Hi, I have a few thoughts on improving visualization with ggplot2 lesson.

  1. Histograms Histograms are often used, and there are different type of histogram shaped functions in ggplot2. Some need just x aesthetics, some need both x and y aesthetics. It will be helpful to cover which are which and have a section to cover these bar-shaped visualizations.

  2. Scatter plots After plotting, I often used geom_smooth on top of the scatterplot to generate lines. People can draw linear regressions right away on the scatterplot, which seems useful.

  3. piping into ggplot2 after dplyr If simple data manipulation is necessary before visualization (eg. sum by groups), it will be helpful to know that one can visualize right after the dplyr pipe line, and do not need to specify ggplot(data = data_name) part.

bolimsydneyson avatar Jan 23 '20 21:01 bolimsydneyson

Reading through all the comments above was very interesting and enlightening, not the least for a novice instructor as myself! I note that in addition to streamlining the narrative (and the reasoning behind what type of plot is useful for different purposes), it is also important to have clear examples that do not simultaneously introduce new concepts and add a level of complexity.

One example of the latter is the final example, under "Exporting plots" - here, the code example suddenly introduces the grid.arrange() function without mentioning that this requires a further package to be installed, i.e. gridExtra and/or arrangeGrob.

## This also works for grid.arrange() plots
combo_plot <- grid.arrange(spp_weight_boxplot, spp_count_plot, ncol = 2, 
                           widths = c(4, 6))
ggsave("combo_plot_abun_weight.png", combo_plot, width = 10, dpi = 300)

(At least on my system, with only tidyverse loaded, the above piece of code didn't execute, but after googling and installing gridExtra, I got some reasonable output ;-)

DrMaggie avatar May 11 '21 14:05 DrMaggie

Please bear with me as I'm not very familiar with github. I've tried to make the below comment legible. Formatting feedback appreciated!

@fmichonneau Looking at the link you shared earlier: https://github.com/datacarpentry/R-ecology-lesson/blob/tidyverse-first/01-visualizing-ggplot.Rmd

It looks like the ratdat package installation instructions need to be added since a simple install in R version 4.1.0 isn't working so I suggest this edit to line 38:

library(devtools)
install_github("weecology/ratdat")
library(ratdat)

note: after loading the ratdat library, I couldn't find the portal_dipo data.frame.

I've read through the above comments and I think the justification of different plot types hasn't been addressed yet.

Line 27: add the following introductory text

Plots are powerful way to:

  • explore your data frame i.e. look at the distribution of continuous data
  • explore relationships between two or more columns
  • present your data and findings to others

Line 54

a column for every dimension

is a bit confusing to me, I would change this to:

a column for every variable

Line 92

A scatter plot is a great way to visually explore the relationship between two columns containing continuous data. It allows you to check for possible patterns.

Challenge

What does this scatter plot tell you about the relationship between weight and hindfoot_length?

Line 156

  1. examine the plot with a different color for each species. Does it look like there is a relationship between these two variables within species?

YaraRAA avatar Jun 01 '21 22:06 YaraRAA

I suggest (re)moving the final section 'Arranging plots', installing and using 'patchwork' to arrange multiple ggplot objects.
It seems beyond the scope of introducing data visualization and would be better placed in a module on report generation.

GitHubDoug avatar Jun 27 '22 18:06 GitHubDoug