ggplot2-solutions icon indicating copy to clipboard operation
ggplot2-solutions copied to clipboard

Eercises 5.6.2 #2

Open HossamGhorab opened this issue 3 years ago • 0 comments

https://github.com/kangnade/ggplot2-solutions/blob/e6ef9e3b271599f6e97afc0f8a3f012f276f9385/ggplot2_solutions_chapter5.Rmd#L218

This took me a while.

  • Plot#1: stat = "ecdf" plots the empirical cumulative density function values of the variable mpg$displ. (Note the monotonically-increasing steps & the y axis range 0 thru 1). Reproduce the plot using: ggplot2::mpg %>% ggplot(aes(displ)) + geom_line(stat = "ecdf", size = 1.2)

  • Plot#2: stat = "qq" plots the observed sample values of a variable against theoretical normally-distributed values. Used as an informal test of normality for a given random variable. Note the labels of the axes (sample VS theoretical) & the standardized breaks used for the theoretical axis. Reproduce the plot using: ggplot2::mpg %>% ggplot( aes(sample = displ)) + geom_point(stat = "qq", size = 3)

  • Plot#3: stat_function(). This is the tricky one. Despite being a geom_density() plot, the overlaid qq density curve can lead man astray! Using stat_qq() or other close relatives lead me no where: yes we can specify the geom = "line" but they don't compute ..density.. so namely they can neither form a curve nor use proper y axis limit (they go far beyond 1, the maximum level for density). The solution is in the stat_function documentation:

Computes and draws a function as a continuous curve. This makes it easy to superimpose a function on top of an existing plot.

Reproduce a less-aesthetically appealing plot using: library(tidyverse); library(MASS) params <- as.list(MASS::fitdistr(mpg$displ, "normal")$estimate) # get the parameters of theoretical normal dist of mpg$displ mpg %>% ggplot(aes(displ)) + geom_density() + stat_function(fun = dnorm, args = params)

Warmly

HossamGhorab avatar Sep 30 '21 19:09 HossamGhorab