naniar
naniar copied to clipboard
Bugfix geom_miss_point. Make lables factors
Description
Fix error occurring when shape is passed in mapping. Basically when one tried to specify shape in aes passed to geom_miss_point (what technically should be possible) for some reason deafult_aes was used for colour that didn't have ..missing.. in env (nor parent env) resulting in error.
Additionally I introduced small change to how labels are constructed. Now they are factor instead of character so the Missing (and by extension Not Missing) will always have the same color regardless of input data. Previously have we split our data into two sub-data frames where one of them would have only complete cases, while second would also include NAs, then plot for the first sub-data frames appeared with different color mappings (on Missing and Not Missing) than second sub-data frame.
Related Issue
https://github.com/njtierney/naniar/issues/290
Example
Thanks to that code below will work
data <- DALEX::apartments
data <- missForest::prodNA(data)
data$no.rooms <- as.factor(data$no.rooms)
library(ggplot2)
library(naniar)
ggplot(data = data, mapping = aes(.data$m2.price, .data$surface)) +
geom_miss_point()
ggplot(data = data, mapping = aes(.data$m2.price, .data$surface, color = .data$district)) +
geom_miss_point()
ggplot(data = data, mapping = aes(.data$m2.price, .data$surface, shape = .data$district)) +
geom_miss_point()
ggplot(data = data, mapping = aes(.data$m2.price, .data$surface, shape = .data$no.rooms, color = .data$district)) +
geom_miss_point()
previously 3rd and 4th plot would not appear (error would be returned instead)
Tests
I made changes to code that is already covered by unit tests. Addiitioanlly related tests were updated.
NEWS + DESCRIPTION
Done
Just to help with the review of this, here's my previous construction from #290 with these changes applied
library(naniar)
library(ggplot2)
mydat <- mtcars
mydat$mpg[1:10] = NA
ggplot(mydat, aes(x = mpg, y = disp, shape = factor(cyl))) +
geom_miss_point(size = 4)
Created on 2021-11-30 by the reprex package (v2.0.1)
FYI, I'm not sure if you need/want to keep the default colour order, but with these changes the Missing points are blue and the Not Missing points are red. That's a consequence of making Missing be the last entry in the factor. With all Not Missing this is consistent
mydat <- mtcars
ggplot(mydat, aes(x = mpg, y = disp, shape = factor(cyl))) +
geom_miss_point(size = 4)
but with all Missing the points are mis-labelled Not Missing
mydat <- mtcars
mydat$mpg = NA
ggplot(mydat, aes(x = mpg, y = disp, shape = factor(cyl))) +
geom_miss_point(size = 4)
Created on 2021-11-30 by the reprex package (v2.0.1)
The reason here seems to be that passing all NA
through GeomPoint
results in a vector of non-missing values
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
attr(,"class")
[1] "mapped_discrete" "numeric"
I can suggest an improvement to label_miss_2d
of
# all missing
if (all(x1 == 1) && inherits(x1, "mapped_discrete")) x1[] <- NA
if (all(x2 == 1) && inherits(x2, "mapped_discrete")) x2[] <- NA
temp <- data.frame(x1, x2) %>% is.na %>% rowSums() %>% factor(levels = c("0", "1", "2"))
forcats::fct_recode(temp, "Not Missing" = "0", "Missing" = "1", "Missing" = "2")
but this still results in only a single legend entry (albeit the correct one, now).
In all of these examples, as well as the 3rd DALEX one, the shape legend is broken (symbols not rendered).
Let us know if there's anything we can do to help reach a solution.
Thank you @jonocarroll so much for your help.
First of all I fixed broken legend for shape. For some reason default_aes NA
for color was causing that. When I switched it to "black"
it works just fine right now. Additionally that default coloring by missingness is handled by stat anyway, so that change to default_aes
does not cause any regression.
Regarding label, good catch, I implemented your suggestions along with appropriate comment, and it works (the label is correct) although I'm not happy with the color. For some reason this Not Missing level is dropped even if we have labels as factors. This result in Missing using the color for Not Missing. This however only happens if whole x or y is full NA, which isn't necessarily very common.
That's why Not Missing is a first level as its way more common to have sub-data frames that have some missing and no missing at all (then colors are ok), than a sub-data frame with all NAs.
Thanks so much for this! This is excellent. I'll take a quick look soon at this locally but I'm sure it's all good.
Hi @jonocarroll and @maksymiuks - apologies this has taken me forever to look into this.
I'm a bit stuck on the fact that the colours are now swapped - I tried to change the factor order to the opposite, but this didn't work, but I'd really like to have this keep the colours the same as previous, I'm not quite sure how to manage this - do you have any thoughts?
As far as I know (and that may be vastly out of date) if you want a colour to be stable you need to specify it with a manual scale. I'd suggest manually specifying colours for 'Missing' and 'Not Missing' regardless of the data, i.e. always having those legend entries present.
OK I'll do that I reckon - that will also mean I can target https://github.com/njtierney/naniar/issues/148 at the same time, I've been meaning to set the colours to something colourblind safe!
It is not as simple as I thought to set the colours manually via the custom geom - have you had experience doing this, @jonocarroll ?
I haven't, sorry. I can try to reacquaint myself with this code sometime and see what I can do.
Thanks! I'll try and have a stab as well, it's been a while since I've written a geom
While I would prefer if the colours stayed the same, I think this solves an important issue so let's merge this once checks are done :)
Resolves #290
Aha! Figured it out - just needed to "revel" the factor levels