ggplot2 Should the shape scale `na.value` default to something?

library(tidyverse)

p <- palmerpenguins::penguins |>
  ggplot() +
  geom_point(
    aes(x = flipper_length_mm, y = body_mass_g, colour = sex, shape = sex),
  )

p
#> Warning: Removed 11 rows containing missing values or values outside the scale range
#> (`geom_point()`).


p +
  scale_shape_discrete(na.value = 78)
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).

^{Created on 2025-09-22 with reprex v2.1.1}

Sep 22 '25 07:09 davidhodge931

The answer to 'could it' is 'yes', but perhaps the bigger question is 'should it' (and if so, what should it be)?

Sep 22 '25 11:09 teunbrand

I think it should be something for similar reasons that colour/fill scales have it grey.

Not sure what it should be. Something a bit obscure probably..

Sep 22 '25 19:09 davidhodge931

So the same argument could be made for a linetype scale, and I also wouldn't know what best represents a missing linetype. And if we'd accept this for all discrete scales, why shouldn't we do this for continuous scales like size and linewidth as well (where it is also unclear what should be a good representation for missing values)? I think there is a risk of escalation here.

An argument is favour for grey as the missing colour value is that grey is the 'blandest' of all colours, missing all eccentricity you'd normally want out of a colour palette. I have trouble imagining the 'blandest' shape, linetype, size or linewidth.

Sep 24 '25 13:09 teunbrand

The only candidate I could imagine working would be the unicode replacement character �

(This is not a serious suggestion.)

Sep 24 '25 16:09 joranE

I often use the X / cross but it is on a case by case basis

library(ggplot2)
library(dplyr)

ggplot(mtcars %>% 
       mutate(cyl=ifelse(mpg<12,NA,cyl)),
       aes(hp,mpg))+
  geom_point(aes(colour=factor(cyl),
                 shape=factor(cyl)))+
  scale_shape_discrete( na.value = 4)

^{Created on 2025-09-24 with reprex v2.1.1}

note this does not work: scale_shape_discrete( na.value = "cross") this works: scale_shape_discrete( na.value = translate_shape_string("cross"))

Sep 24 '25 20:09 smouksassi

I actually think cross (4) would be a pretty good default tbh.

Not sure what would be best for linetype. Maybe 6 if no better ideas, as would be least likely to match other values in the data??

It's better from a data analysis point of view to see the NAs in the data by default, as you can always drop them eaily by adding tidyr::drop_na before the ggplot call

Sep 26 '25 06:09 davidhodge931

The problem with setting a non-logical na.value for any scale that can use multiple types is that it will run into issues with type casting. For example, if we use a numeric palette and numeric na.value, all is good.

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.5.2

df <- mpg
df$drv[c(28, 213)] <- NA

ggplot(df, aes(displ, hwy, shape = drv)) +
  geom_point() +
  scale_shape_manual(
    values = c(16, 17, 15),
    na.value = 4
  )

However, if we mix a character palette and a numeric na.value, we get horrible warnings.


ggplot(df, aes(displ, hwy, shape = drv)) +
  geom_point() +
  scale_shape_manual(
    values = c("circle small", "triangle", "square"),
    na.value = 4
  )
#> Error in `vec_slice<-`:
#> ! Can't convert `na_value` <double> to <character>.

^{Created on 2025-12-08 with reprex v2.1.1}

The same problem holds for a linetype scale, that can use 1/2 or "solid"/"dashed" etc. I can already hear the reverse dependencies breaking in my nightmares.

Dec 08 '25 12:12 teunbrand