ggrepel icon indicating copy to clipboard operation
ggrepel copied to clipboard

Example of algorithm not giving good results

Open MFairley opened this issue 3 years ago • 2 comments

Summary

I have an example where the labels overlap even under a large number of iterations. It should be easy to find positions that do not overlap. I'm not quite sure why this isn't working. I would also suggest adding a warning if the algorithm cannot find label positions without overlaps.

Minimal code example

Here is the minimum amount of code neeeded to demonstrate the issue:

set.seed(1)
rnd_label <- function(n) {
  paste0(sample(LETTERS, n, replace = TRUE), collapse = "")
}
dt <- data.table(x = c(261156.9, 272704.9, 277045.2, 296642.4, 321354.4),
                 y = c(11.57584, 12.72191, 12.92108, 13.33607, 13.39414),
                 label_length = c(12, 23, 32, 35, 39))
dt[, label := lapply(label_length, rnd_label)]
ylim <- c(10, 14.0)
xlim <- c(2.5e5, 4e5)
g <- ggplot(dt, aes(x, y, label=label)) +
    geom_line() +
    geom_point() +
    scale_y_continuous(breaks = scales::breaks_pretty(n=9), lim=ylim, name = "y") +
    scale_x_continuous(labels=comma, breaks = scales::breaks_pretty(n = 9), lim=xlim, name = "x") +
    geom_label_repel(seed=1, size=3, max.iter = 200000, segment.alpha = 0.25, min.segment.length = 0.25, box.padding=1.2)
ggsave("test_repel.png", plot = g, width = 6.5, height = 4.017, units = "in")

Here is an image of the output produced by the code:

test_repel

Version information

Here is the output from sessionInfo() in my R session:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dampack_0.2.0     devtools_2.3.0    usethis_1.6.1     stringr_1.4.0     ggrepel_0.8.2     scales_1.1.0      ggplot2_3.3.2     data.table_1.12.8 here_0.1         

loaded via a namespace (and not attached):
 [1] tidyselect_1.0.0  xfun_0.13         remotes_2.1.1     purrr_0.3.4       reshape2_1.4.4    splines_3.6.3     lattice_0.20-41   colorspace_1.4-1  vctrs_0.2.4       testthat_2.3.2    yaml_2.2.1        mgcv_1.8-31       rlang_0.4.5      
[14] pkgbuild_1.0.7    pillar_1.4.3      glue_1.4.0        withr_2.2.0       sessioninfo_1.1.1 lifecycle_0.2.0   plyr_1.8.6        munsell_0.5.0     gtable_0.3.0      memoise_1.1.0     knitr_1.28        callr_3.4.3       ps_1.3.2         
[27] fansi_0.4.1       triangle_0.12     Rcpp_1.0.4        backports_1.1.6   desc_1.2.0        pkgload_1.0.2     truncnorm_1.0-8   farver_2.0.3      fs_1.4.1          ellipse_0.4.1     packrat_0.5.0     digest_0.6.25     stringi_1.4.6    
[40] processx_3.4.2    dplyr_0.8.5       grid_3.6.3        rprojroot_1.3-2   cli_2.0.2         tools_3.6.3       magrittr_1.5      tibble_3.0.1      crayon_1.3.4      pkgconfig_2.0.3   Matrix_1.2-18     ellipsis_0.3.0    prettyunits_1.1.1
[53] assertthat_0.2.1  rstudioapi_0.11   R6_2.4.1          nlme_3.1-147      compiler_3.6.3   

MFairley avatar Jul 14 '20 17:07 MFairley

I am (sometimes) getting similar results. What works for me is the adjustment of force (including force_pull in the development version) and padding (point.padding, box.padding) parameters. Would it make sense to have some auto-adjustment based on the number of overlaps as a stop criterion? This could have an interface like geom_label(force="auto")?

krassowski avatar Jul 14 '20 18:07 krassowski

@MFairley Thanks for pointing this out! I think the algorithm can certainly be improved to work for this particular case. Since the two labels at the top are at exactly the same y-axis position, they don't exert a force along the y-axis on each other.

One simple way to work around this would be to use "jittered" text label coordinates instead of exact coordinates when calculating forces inside ggrepel. That would eliminate the possibility for labels to be at exactly the same y-axis position.

Another problem is when a text label is surrounded from all sides by data points. The label can get stuck in the middle of the points instead of escaping to the outside of the point cloud.

@krassowski I'm open to any suggestions or ideas. By the way, I apologize for being slow with PRs and issues.

slowkow avatar Jul 14 '20 18:07 slowkow