ggwordcloud icon indicating copy to clipboard operation
ggwordcloud copied to clipboard

A issue of too much space between words

Open recordyao opened this issue 3 years ago • 11 comments

Dear creator

I have a problem while using geom_text_worcloud. The word spacing is too large. That happens even when I use the exactly the same code as yours. I find two questions online regarding to the same issue, but didn't find a solution. Can you let me know why? Thank you!

This is desired, but using wordcloud() command. Screen Shot 2021-12-10 at 9 55 13 AM

This is undesired with too much spacing while using geom_text_wordcloud with ggplot(): Screen Shot 2021-12-10 at 9 56 44 AM

recordyao avatar Dec 10 '21 01:12 recordyao

I have this problem, too!

dernesa avatar Dec 22 '21 12:12 dernesa

Could you send me the code as well as the platform you are using. This looks like a difference between the font used to compute the boxes and the one used finally...

Le mer. 22 déc. 2021 à 13:55, Mathias Gerl @.***> a écrit :

I have this problem, too!

— Reply to this email directly, view it on GitHub https://github.com/lepennec/ggwordcloud/issues/15#issuecomment-999556009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4BD2NQUEMZJIVC5H2BMPTUSHDDBANCNFSM5JYAIGAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

lepennec avatar Dec 22 '21 22:12 lepennec

Hi, I created a regex. See below! Also I noticed that ggwordcloud(v.0.5.0.9000) seems to work, while ggwordcloud_0.5.0 does produce the output below.

I do not select an specific font, but have problems with your non-latin characters in your examples, therefore I only use latin characters in my example.

I hope this helps.

cheers

library(tidyverse)
library(ggwordcloud)

normal_font_love <- love_words %>% 
  filter(grepl("^[a-z]*$",word, ignore.case = T)) %>% 
  .[1:30,]

set.seed(42)
p <- ggplot(normal_font_love, aes(label = word, size = speakers)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 30) +
  theme_minimal()

ggsave("love_words_small_R.png",
       height = 5,
       width = 10)


sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Monterey 12.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] ggwordcloud_0.5.0 forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7      
#>  [5] purrr_0.3.4       readr_2.0.2       tidyr_1.1.4       tibble_3.1.6     
#>  [9] ggplot2_3.3.3     tidyverse_1.3.1  
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.1 xfun_0.28        haven_2.4.3      colorspace_2.0-2
#>  [5] vctrs_0.3.8      generics_0.1.1   htmltools_0.5.2  yaml_2.2.1      
#>  [9] utf8_1.2.2       rlang_0.4.12     pillar_1.6.4     glue_1.5.0      
#> [13] withr_2.4.2      DBI_1.1.1        dbplyr_2.1.1     modelr_0.1.8    
#> [17] readxl_1.3.1     lifecycle_1.0.1  cellranger_1.1.0 munsell_0.5.0   
#> [21] gtable_0.3.0     rvest_1.0.2      evaluate_0.14    labeling_0.4.2  
#> [25] knitr_1.36       tzdb_0.2.0       fastmap_1.1.0    fansi_0.5.0     
#> [29] highr_0.9        Rcpp_1.0.7       broom_0.7.10     backports_1.3.0 
#> [33] scales_1.1.1     jsonlite_1.7.2   farver_2.1.0     fs_1.5.0        
#> [37] png_0.1-7        hms_1.1.1        digest_0.6.28    stringi_1.7.5   
#> [41] grid_4.1.2       cli_3.1.0        tools_4.1.2      magrittr_2.0.1  
#> [45] crayon_1.4.2     pkgconfig_2.0.3  ellipsis_0.3.2   xml2_1.3.2      
#> [49] reprex_2.0.1     lubridate_1.8.0  assertthat_0.2.1 rmarkdown_2.11  
#> [53] httr_1.4.2       rstudioapi_0.13  R6_2.5.1         compiler_4.1.2

Created on 2021-12-23 by the reprex package (v2.0.1)

With the following output:
love_words_small_R

dernesa avatar Dec 22 '21 23:12 dernesa

Thank you. I know more or less what's going on... It seems that nothing is working as planned when computing the text masks and very crude rectangular bounding boxes are used instead. I do not have access to a Mac os but I will see if I can find a workaround. I do not plan to do it during the holidays but I will try to do this as soon as possible.

Le jeu. 23 déc. 2021 à 00:18, Mathias Gerl @.***> a écrit :

Hi, I created a regex. See below! Also I noticed that ggwordcloud(v.0.5.0.9000) seems to work, while ggwordcloud_0.5.0 does produce the output below.

I do not select an specific font, but have problems with your non-latin characters in your examples, therefore I only use latin characters in my example.

I hope this helps.

cheers

library(tidyverse) library(ggwordcloud) normal_font_love <- love_words %>% filter(grepl("^[a-z]*$",word, ignore.case = T)) %>% .[1:30,]

set.seed(42)p <- ggplot(normal_font_love, aes(label = word, size = speakers)) + geom_text_wordcloud() + scale_size_area(max_size = 30) + theme_minimal()

ggsave("love_words_small_R.png", height = 5, width = 10)

sessionInfo()#> R version 4.1.2 (2021-11-01)#> Platform: aarch64-apple-darwin20 (64-bit)#> Running under: macOS Monterey 12.1#> #> Matrix products: default#> BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib#> #> locale:#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8#> #> attached base packages:#> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages:#> [1] ggwordcloud_0.5.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 #> [5] purrr_0.3.4 readr_2.0.2 tidyr_1.1.4 tibble_3.1.6 #> [9] ggplot2_3.3.3 tidyverse_1.3.1 #> #> loaded via a namespace (and not attached):#> [1] tidyselect_1.1.1 xfun_0.28 haven_2.4.3 colorspace_2.0-2#> [5] vctrs_0.3.8 generics_0.1.1 htmltools_0.5.2 yaml_2.2.1 #> [9] utf8_1.2.2 rlang_0.4.12 pillar_1.6.4 glue_1.5.0 #> [13] withr_2.4.2 DBI_1.1.1 dbplyr_2.1.1 modelr_0.1.8 #> [17] readxl_1.3.1 lifecycle_1.0.1 cellranger_1.1.0 munsell_0.5.0 #> [21] gtable_0.3.0 rvest_1.0.2 evaluate_0.14 labeling_0.4.2 #> [25] knitr_1.36 tzdb_0.2.0 fastmap_1.1.0 fansi_0.5.0 #> [29] highr_0.9 Rcpp_1.0.7 broom_0.7.10 backports_1.3.0 #> [33] scales_1.1.1 jsonlite_1.7.2 farver_2.1.0 fs_1.5.0 #> [37] png_0.1-7 hms_1.1.1 digest_0.6.28 stringi_1.7.5 #> [41] grid_4.1.2 cli_3.1.0 tools_4.1.2 magrittr_2.0.1 #> [45] crayon_1.4.2 pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.2 #> [49] reprex_2.0.1 lubridate_1.8.0 assertthat_0.2.1 rmarkdown_2.11 #> [53] httr_1.4.2 rstudioapi_0.13 R6_2.5.1 compiler_4.1.2

Created on 2021-12-23 by the reprex package https://reprex.tidyverse.org (v2.0.1)

With the following output: [image: love_words_small_R] https://user-images.githubusercontent.com/24799198/147165339-f37bc827-7d32-4c29-9ad0-768f8f832c12.png

— Reply to this email directly, view it on GitHub https://github.com/lepennec/ggwordcloud/issues/15#issuecomment-999936035, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4BD2KQVWJ4OX26ZM73VTDUSJME7ANCNFSM5JYAIGAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

lepennec avatar Dec 22 '21 23:12 lepennec

Hi,

Can you try with the latest version available on GitHub?

Yours

Le ven. 9 sept. 2022 à 19:18, Carlos López de la Cerda < @.***> a écrit :

Hi @lepennec https://github.com/lepennec!

I have the same problem as @dernesa https://github.com/dernesa, did you find any work around?

— Reply to this email directly, view it on GitHub https://github.com/lepennec/ggwordcloud/issues/15#issuecomment-1242245630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4BD2LN2XZ4OISANOMBIVTV5NWNZANCNFSM5JYAIGAA . You are receiving this because you were mentioned.Message ID: @.***>

lepennec avatar Sep 09 '22 21:09 lepennec

Hi I wonder if there is a solution to this problem? I'm having the same problem and have installed the developmental version but still having too much space between words. Could you help me fix this? Thank you!

mengliuveronica avatar Oct 17 '22 09:10 mengliuveronica

Are you also using OS X?

Le lun. 17 oct. 2022 à 11:18, Meng Liu @.***> a écrit :

Hi I wonder if there is a solution to this problem? I'm having the same problem and have installed the developmental version but still having too much space between words. Could you help me fix this? Thank you!

— Reply to this email directly, view it on GitHub https://github.com/lepennec/ggwordcloud/issues/15#issuecomment-1280545007, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4BD2PEZR7E2WKBX4CC7ETWDUKV7ANCNFSM5JYAIGAA . You are receiving this because you were mentioned.Message ID: @.***>

lepennec avatar Oct 17 '22 09:10 lepennec

Yes! This is my session_info in case it's helpful: R version 4.2.1 (2022-06-23) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.6

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggwordcloud_0.6.0 ggthemes_4.2.4 rio_0.5.29 forcats_0.5.1
[6] stringr_1.4.0 dplyr_1.0.9 purrr_0.3.5 readr_2.1.2 tidyr_1.2.0
[11] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.2 pacman_0.5.1

mengliuveronica avatar Oct 17 '22 09:10 mengliuveronica

Thank you. I need to have access to a macOS to understand exactly what is going on...

On Mon, Oct 17, 2022 at 11:50 AM Meng Liu @.***> wrote:

Yes! This is my session_info in case it's helpful: R version 4.2.1 (2022-06-23) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.6

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggwordcloud_0.6.0 ggthemes_4.2.4 rio_0.5.29 forcats_0.5.1 [6] stringr_1.4.0 dplyr_1.0.9 purrr_0.3.5 readr_2.1.2 tidyr_1.2.0 [11] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.2 pacman_0.5.1

— Reply to this email directly, view it on GitHub https://github.com/lepennec/ggwordcloud/issues/15#issuecomment-1280591829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4BD2JPWR6KO2ZDIGYI7O3WDUONTANCNFSM5JYAIGAA . You are receiving this because you were mentioned.Message ID: @.***>

lepennec avatar Oct 17 '22 12:10 lepennec

Hi everyone. I managed to use the developer version [0.6.0] and it fixed the spacing issue.

I had resolved the non-latin font issue following this great explanation on stackoverflow.

One problem I do see now is word sizes. 愛, with speakers number of 1200, looks almost four times as big as the "Love", with speakers number of 800. FYI @lepennec. If you want me to open a new issue let me know.

love_words_small_wordsize

Below is the script and the session info.

library(tidyverse) #> Loading required package: ggplot2
library(ggwordcloud)
library(showtext)

# Find usable Font
(where <- font_files()[which(str_detect(font_files()$family, "Arial Unicode MS")), ])
# add the font to the workspace
font_add(family = where[1, ]$family, regular = where[1, ]$file)
showtext_auto()

# Load data
data("love_words_small")

set.seed(42)
# Wordcloud with size attribute
ggplot(data = love_words_small, aes(label = word, size = speakers)) +
  geom_text_wordcloud_area(
    # family name of the font
    family = where[1, ]$family) +
  scale_size_area(max_size = 24) +
  theme_minimal()

sessionInfo()
# R version 4.3.0 (2023-04-21)
# Platform: aarch64-apple-darwin20 (64-bit)
# Running under: macOS Ventura 13.4.1

# Matrix products: default
# BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
# LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# time zone: Europe/London
# tzcode source: internal

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
#  [1] showtext_0.9-6    showtextdb_3.0    sysfonts_0.8.8    ggwordcloud_0.6.0 lubridate_1.9.2  
#  [6] forcats_1.0.0     stringr_1.5.0     dplyr_1.1.2       purrr_1.0.1       readr_2.1.4      
# [11] tidyr_1.3.0       tibble_3.2.1      ggplot2_3.4.2     tidyverse_2.0.0  

# loaded via a namespace (and not attached):
#  [1] gtable_0.3.3     compiler_4.3.0   tidyselect_1.2.0 Rcpp_1.0.11      xml2_1.3.5       scales_1.2.1    
#  [7] png_0.1-8        R6_2.5.1         labeling_0.4.2   commonmark_1.9.0 generics_0.1.3   munsell_0.5.0   
# [13] pillar_1.9.0     tzdb_0.4.0       rlang_1.1.1      utf8_1.2.3       stringi_1.7.12   xfun_0.39       
# [19] timechange_0.2.0 cli_3.6.1        withr_2.5.0      magrittr_2.0.3   grid_4.3.0       gridtext_0.1.5  
# [25] rstudioapi_0.14  markdown_1.7     hms_1.1.3        lifecycle_1.0.3  vctrs_0.6.3      glue_1.6.2      
# [31] farver_2.1.1     fansi_1.0.4      colorspace_2.1-0 tools_4.3.0      pkgconfig_2.0.3 

kclo22 avatar Jul 24 '23 09:07 kclo22

Let me see what I can do...

On Mon, Jul 24, 2023 at 11:22 AM EdwardL08 @.***> wrote:

Hi everyone. I managed to use the developer version [0.6.0] and it fixed the spacing issue.

I had resolved the non-latin font issue following this great explanation on stackoverflow https://stackoverflow.com/questions/74415534/why-do-characters-from-foreign-alphabets-not-show-in-my-wordcloud-on-r .

One problem I do see now is word sizes. 愛, with speakers number of 1200, looks almost four times as big as the "Love", with speakers number of 800. FYI @lepennec https://github.com/lepennec. If you want me to open a new issue let me know.

[image: love_words_small_wordsize] https://user-images.githubusercontent.com/65240598/255545775-552897df-4f1e-4bbd-babf-390abbef589b.png

Below is the script and the session info.

library(tidyverse) #> Loading required package: ggplot2 library(ggwordcloud) library(showtext)

Find usable Font

(where <- font_files()[which(str_detect(font_files()$family, "Arial Unicode MS")), ])# add the font to the workspace font_add(family = where[1, ]$family, regular = where[1, ]$file) showtext_auto()

Load data

data("love_words_small")

set.seed(42)# Wordcloud with size attribute ggplot(data = love_words_small, aes(label = word, size = speakers)) + geom_text_wordcloud_area( # family name of the font family = where[1, ]$family) + scale_size_area(max_size = 24) + theme_minimal()

sessionInfo()# R version 4.3.0 (2023-04-21)# Platform: aarch64-apple-darwin20 (64-bit)# Running under: macOS Ventura 13.4.1

Matrix products: default# BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib # LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale:# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London# tzcode source: internal

attached base packages:# [1] stats graphics grDevices utils datasets methods base

other attached packages:# [1] showtext_0.9-6 showtextdb_3.0 sysfonts_0.8.8 ggwordcloud_0.6.0 lubridate_1.9.2 # [6] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 # [11] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0

loaded via a namespace (and not attached):# [1] gtable_0.3.3 compiler_4.3.0 tidyselect_1.2.0 Rcpp_1.0.11 xml2_1.3.5 scales_1.2.1 # [7] png_0.1-8 R6_2.5.1 labeling_0.4.2 commonmark_1.9.0 generics_0.1.3 munsell_0.5.0 # [13] pillar_1.9.0 tzdb_0.4.0 rlang_1.1.1 utf8_1.2.3 stringi_1.7.12 xfun_0.39 # [19] timechange_0.2.0 cli_3.6.1 withr_2.5.0 magrittr_2.0.3 grid_4.3.0 gridtext_0.1.5 # [25] rstudioapi_0.14 markdown_1.7 hms_1.1.3 lifecycle_1.0.3 vctrs_0.6.3 glue_1.6.2 # [31] farver_2.1.1 fansi_1.0.4 colorspace_2.1-0 tools_4.3.0 pkgconfig_2.0.3

— Reply to this email directly, view it on GitHub https://github.com/lepennec/ggwordcloud/issues/15#issuecomment-1647539729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4BD2OVDIG5W2Z3BZP67KDXRY5HFANCNFSM5JYAIGAA . You are receiving this because you were mentioned.Message ID: @.***>

lepennec avatar Jul 24 '23 11:07 lepennec