leaflet icon indicating copy to clipboard operation
leaflet copied to clipboard

colorQuantile function requires refining to decide quantiles when data is not well distributed

Open yyz1989 opened this issue 9 years ago • 11 comments

Hi there,

First of all thank you very much for the efforts to develop leaflet for R!

Recently I encountered an issue when trying to decide quantiles for data when coloring the map. The data is not well distributed so it cannot even select 9 quantiles in 306 values. Here is the code.

> vals
  [1]   2   0   2 127   2   3   1   8   7   3   4  16   3   5   4   6   1   3   5   1   1   4   4   1  11   1   0   1   3   2   3   7  14   2   9  16   1   6   5   4   3   5  10   8   7   4   0
 [48]  11   5   4   7   3   2   8   1   3   4   3   3  21  11   6   2  10   5   5   2   2   5   2   5  10  27   0   5   9   6   6   5   4  20   7   4   6   8   3   5  20   4   2   8   3   7  10
 [95]   1   8   7   2   1   9   2   5 137   4   3   0   3   5   6   7   2  17   4   4   4   4   3   2   0   2   5   1   0   3   1   2   0  10  12   3   7   5   0  11   5   4   2   2   1   0   4
[142]   6   2   3   2   7   6   2   1   5   1   9   2   0   4   0   7   3   4   4   0   0   1   0   5   8   1   2   4  10  12  15   2   5   3   2   3   1   5   0   7  67   1   2   5   1   2   0
[189]   3   6   1   7   2   1   4   2   2   8   0   3   3   7   3  16   8   6  18   2  20   5   3  28   2   6   5   5   6   1  10   2  18   7   5   7   8   6   6   7   7  10   4  17   6  12   5
[236]   5   4   4   3   6   8   2   2   0   7   6   4   3   4   4   6   3  39   4   2   1   3   2   3   1   0   6   6   0   4   1  14  13   0  25   2   4   5   5  11   2   2   0  17   6   2   3
[283]   5   7   1   4   3   2  19   2   2   6   1   1   2   1   2   1   1   4   3   0   4   3
> pal <- colorQuantile("YlOrRd", NULL, n = 9)
> pal(vals)
Error in cut.default(x, binsToUse, labels = FALSE, include.lowest = TRUE,  : 
  'breaks' are not unique

Then I looked into the source code of colorQuantile function and tried to debug it step by step.

> probs <- seq(0, 1, length.out = 10)
> probs
 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667 0.7777778 0.8888889 1.0000000
> bins <- quantile(vals, probs, na.rm = TRUE, names = FALSE)
> bins
 [1]   0   1   2   2   3   4   5   7  10 137

So this is the reason! There are 2 same quantiles causing the problem in cut. If I decrease the number of quantile to 8, then there won't be any problem. However, based on different filter criteria, query results cannot be predicted so it is difficult to predefine the number of quantiles. So what I am doing is as follows:

      probs <- seq(0, 1, length.out = quantileNum + 1)
      bins <- quantile(newObsVal, probs, na.rm = TRUE, names = FALSE)
      while (length(unique(bins)) != length(bins)) {
        quantileNum <- quantileNum - 1
        probs <- seq(0, 1, length.out = quantileNum + 1)
        bins <- quantile(newObsVal, probs, na.rm = TRUE, names = FALSE)
      }

It looks quite ugly and may lead to performance issues, but it works. After some investigation, I also discovered that it seems we can also use .bincode to categorize data:

> bins
 [1]   0   1   2   2   3   4   5   7  10 137
> .bincode(vals, bins, include.lowest = TRUE, right = FALSE)
  [1] 4 1 4 9 4 5 2 8 8 5 6 9 5 7 6 7 2 5 7 2 2 6 6 2 9 2 1 2 5 4 5 8 9 4 8 9 2 7 7 6 5 7 9 8 8 6 1 9 7 6 8 5 4 8 2 5 6 5 5 9 9 7 4 9 7 7 4 4 7 4 7 9 9 1 7 8 7 7 7 6 9 8 6 7 8 5 7 9 6 4 8 5 8 9
 [95] 2 8 8 4 2 8 4 7 9 6 5 1 5 7 7 8 4 9 6 6 6 6 5 4 1 4 7 2 1 5 2 4 1 9 9 5 8 7 1 9 7 6 4 4 2 1 6 7 4 5 4 8 7 4 2 7 2 8 4 1 6 1 8 5 6 6 1 1 2 1 7 8 2 4 6 9 9 9 4 7 5 4 5 2 7 1 8 9 2 4 7 2 4 1
[189] 5 7 2 8 4 2 6 4 4 8 1 5 5 8 5 9 8 7 9 4 9 7 5 9 4 7 7 7 7 2 9 4 9 8 7 8 8 7 7 8 8 9 6 9 7 9 7 7 6 6 5 7 8 4 4 1 8 7 6 5 6 6 7 5 9 6 4 2 5 4 5 2 1 7 7 1 6 2 9 9 1 9 4 6 7 7 9 4 4 1 9 7 4 5
[283] 7 8 2 6 5 4 9 4 4 7 2 2 4 2 4 2 2 6 5 1 6 5

So maybe we can replace cut with .bincode? However I don't know if there is any other side-effect. If we should not, what can I do to get around this issue?

Thank you very much and have a nice day.

Regards, Yang

yyz1989 avatar May 18 '15 12:05 yyz1989

If a value of 2 is passed in, which bin does it belong to? It looks like the result of .bincode is to simply throw away one of the bins, I'm not sure that's a better result than an error...?

I could maybe imagine performing your loop, with a warning message...

jcheng5 avatar May 18 '15 17:05 jcheng5

Hi @jcheng5

Thank you very much for your reply. Yeah, using .bincode will keep the problem silent, but I am also not sure about the consequence :smile:. If there is no other way to get around this issue, I also agree to throw out a warning message and try less breaks because this issue is really difficult to detect. It works for 99% of the time but only fails in really exceptional case. I guess at least it should be better to avoid throwing out an error and terminating the program when the problem is not critical. How do you think about it?

Thank you very much and have a nice day.

Regards, Yang

yyz1989 avatar May 20 '15 08:05 yyz1989

I'd like this feature as well. I think if breaks are not unique, I'd be happy to re-allocate on the fly rather than stop the program or try to program around every edge case. I mostly just want to put colors on my map.

gregmacfarlane avatar Dec 04 '17 20:12 gregmacfarlane

I'm also running into this issue, and while changing the number of breaks is a workaround, I just wanted to add another voice of support for this feature.

richpauloo avatar Jul 11 '18 22:07 richpauloo

Was this ever figured out?

sdk2116 avatar Oct 03 '18 15:10 sdk2116

Bringing attention to this topic again! Seems to still be an issue.

deidrem avatar Dec 19 '20 02:12 deidrem

Likewise adding my voice to this.

In the meantime I'm using colorNumeric() from the cases where this is an issue.

tomlincr avatar May 06 '21 12:05 tomlincr

yeah, it's been a problem for me too.

jumanbar avatar Jul 14 '21 22:07 jumanbar

Still a problem for me as well.

velicknd avatar Jul 26 '22 20:07 velicknd

I would also like for this issue to throw out a warning instead of an error as well!

knharrington avatar Jan 10 '24 17:01 knharrington

This workaround uses getJenksBreaks from BAMMtools.

This does not fix the issue with quantile, but Jenks Breaks are a commonly used bin method in GIS programs. Therefore, I think is a reasonable alternative in the meantime. That is, it puts the number of colors that I want on the map.

factor_domain <- cut(
    tb$numericvar, 
    breaks = getJenksBreaks(tb$numericvar, n_breaks), 
    include.lowest = T
)

pal <- colorFactor(
    palette = "YlOrRd",
    domain = factor_domain
)

m <- leaflet(tb) %>%
    addProviderTiles("CartoDB") %>% 
    addPolygons(
        fillColor = ~pal(factor_domain),
        weight = 2,
        opacity = 1,
        color = "white",
        fillOpacity = 0.1
    )

m

michaelpkuhn avatar Apr 27 '24 22:04 michaelpkuhn