ggplot2-solutions
ggplot2-solutions copied to clipboard
Eercises 5.6.2 #2
https://github.com/kangnade/ggplot2-solutions/blob/e6ef9e3b271599f6e97afc0f8a3f012f276f9385/ggplot2_solutions_chapter5.Rmd#L218
This took me a while.
-
Plot#1:
stat = "ecdf"
plots the empirical cumulative density function values of the variable mpg$displ. (Note the monotonically-increasing steps & the y axis range 0 thru 1). Reproduce the plot using:ggplot2::mpg %>% ggplot(aes(displ)) + geom_line(stat = "ecdf", size = 1.2)
-
Plot#2:
stat = "qq"
plots the observed sample values of a variable against theoretical normally-distributed values. Used as an informal test of normality for a given random variable. Note the labels of the axes (sample VS theoretical) & the standardized breaks used for the theoretical axis. Reproduce the plot using:ggplot2::mpg %>% ggplot( aes(sample = displ)) + geom_point(stat = "qq", size = 3)
-
Plot#3:
stat_function()
. This is the tricky one. Despite being a geom_density() plot, the overlaid qq density curve can lead man astray! Using stat_qq() or other close relatives lead me no where: yes we can specify the geom = "line" but they don't compute ..density.. so namely they can neither form a curve nor use proper y axis limit (they go far beyond 1, the maximum level for density). The solution is in thestat_function
documentation:
Computes and draws a function as a continuous curve. This makes it easy to superimpose a function on top of an existing plot.
Reproduce a less-aesthetically appealing plot using:
library(tidyverse); library(MASS)
params <- as.list(MASS::fitdistr(mpg$displ, "normal")$estimate) # get the parameters of theoretical normal dist of mpg$displ
mpg %>% ggplot(aes(displ)) + geom_density() + stat_function(fun = dnorm, args = params)
Warmly