stat_count gives cryptic error when used on a column of doubles
Run this trivial code (csv is attached):
library(tidyverse)
data <- read_csv("data.csv")
data %>%
ggplot(aes(x=Tenure)) +
geom_bar()
You get this warning:
Warning message:
Computation failed in `stat_count()`:
Elements must equal the number of rows or 1
This is the resulting plot:

The data: data.csv
Expected behavior: it should "just work". All the data in this tibble is just doubles. I reviewed the geom_bar documentation, and I see no contraindications for this working.
If I have done something wrong here, then this becomes a feature request for a useful error message or improved documentation.
Bar plots are for categorical data, histograms for numerical data. You're trying to make a bar plot from numerical data. That makes no sense. Try geom_histogram() or turn your data into a factor.
I was wondering if that was the case, and I sympathize with your point.
While you are correct that the Tenure column is numerical data, it is still 36 unique values (categories) over 835 observations (confirmed via unique(data$Tenure)).
data <- read_csv("data.csv", col_types = "c") does cause it to work, but this seems unnecessary since, again, there are 36 unique values.
All my bloviating aside, it would be great if the error message or documentation could be adjusted to help with cases like this.
Aha, rounding the values causes it to work:
data <- read_csv("data.csv") %>% mutate(Tenure = round(Tenure, 2))
Something odd is going on here. I wonder if it's getting tripped up by some of the values being doubles that need rounding? E.g., row 8 of the CSV is 1.7999999999999998.
Minimal reprex:
library(ggplot2)
df <- data.frame(x = rep(c(1, 2), 5) + rep(c(0, -2.220446e-16), c(4, 1)))
df
#> x
#> 1 1
#> 2 2
#> 3 1
#> 4 2
#> 5 1
#> 6 2
#> 7 1
#> 8 2
#> 9 1
#> 10 2
ggplot(df, aes(x)) + geom_bar()
#> Warning: Computation failed in `stat_count()`:
#> Elements must equal the number of rows or 1

Created on 2022-03-15 by the reprex package (v2.0.1)
Since this seems like a FP buglet I think it's worth taking a bit of a look to see what's going wrong.
It seems the problem is that the criteria of the "same" value differ between vctrs::vec_unique() (which is used in unique0()) and as.factor() (in tapply()).
https://github.com/tidyverse/ggplot2/blob/a979ffd26cdb456d54e2671c2eed16c65bc878b7/R/stat-count.r#L79-L89
df <- data.frame(x = rep(c(1, 2), 5) + rep(c(0, -2.220446e-16), c(4, 1)))
ggplot2:::unique0(df$x)
#> [1] 1 2 1 2
tapply(rep(1, times = nrow(df)), df$x, sum, na.rm = TRUE)
#> 1 2
#> 5 5
as.factor(df$x)
#> [1] 1 2 1 2 1 2 1 2 1 2
#> Levels: 1 2
Created on 2022-07-23 by the reprex package (v2.0.1)
In such case, should we add a tolerance or treat them as unequal? If treated as unequal, we could replace the tapply() by rowsum().
I'd say we follow whatever vec_unique does.