ggdist icon indicating copy to clipboard operation
ggdist copied to clipboard

Inconsistencies with the weighted quantile function

Open bschneidr opened this issue 5 months ago • 0 comments

Hi @mjskay,

I noticed that the weighted quantile method used in the package has some strange behavior, as do similar weighted quantile functions in other R packages. The quantile estimate depends on the arbitrary order in which records in the data are sorted. Here's a reprex showing the results from ggdist as well as the packages 'survey' and 'collapse' (the latter of which I think is trying to do the same thing as ggdist here).

# Create example data, estimate quantile
  data <- data.frame(
    x = c(2   , 2  ,  3   , 3   ),
    w = c(0.25, 0.15, 0.35, 0.25)
  )
  
  ggdist::weighted_quantile(x = data$x, weights = data$w, probs = 0.5)
#>      50% 
#> 2.640845
  survey:::qrule_hf7(x = data$x, w = data$w, p = 0.5)
#> [1] 2.833333
  collapse::fquantile(x = data$x, w = data$w, p = 0.5, type = 7)
#>      50% 
#> 2.785714

# Sort the data differently, then estimate the quantile again
  data2 <- data |> dplyr::arrange(x, w)
  
  ggdist::weighted_quantile(x = data2$x, weights = data2$w, probs = 0.5)
#> 50% 
#> 2.9
  survey:::qrule_hf7(x = data2$x, w = data2$w, p = 0.5)
#> [1] 2.7
  collapse::fquantile(x = data2$x, w = data2$w, p = 0.5, type = 7)
#> 50% 
#> 2.9

Created on 2025-11-05 with reprex v2.1.1

I put together a blog post to describe the underlying problem with these implementations and suggest a way forward to resolving them, based on some ideas from an earlier blog post of yours. I'd be curious to hear your thoughts on this issue and what you think might be the best ways to address the issue apparent in the reprex here.

https://www.practicalsignificance.com/posts/weighted-quantile-weirdness/

bschneidr avatar Nov 05 '25 17:11 bschneidr