rgeoda
rgeoda copied to clipboard
Skater crashing R
I updated to the latest version (0.10.4) because the "maxp_greedy" function was crashing R (see issue #39), but it seems that the bug wasn´t corrected for the "skater" function.
Hey @lixun910, is there an ETA on this?
Which OS did you use? The skater seems working fine on my MacOS… Thanks!
I am using RStudio Server on Ubuntu 20.04. Interestingly, things work as expected when I reduce the number of rows in the data! Is there a limit on how much data rgeoda::skater
function can handle? The total observations that I have is ~1800.
There is no limitation of the data size. I think it maybe other things causing the crash, like invalid values or connectivity structure. Is it possible to share your data and steps with me to replicate? Thanks!
@lixun910, here's a reprex:
# tigris version 2.0.1
# rgeoda version 0.0.10.4
# dplyr version 1.1.1
# sf version 1.0.12
set.seed(100)
ca_zctas <- tigris::zctas(year = 2010, state = "CA") |>
dplyr::mutate(value = rexp(dplyr::n()))
ca_queen_w <- rgeoda::queen_weights(ca_zctas)
ca_zcta_clusters <- rgeoda::skater(
5, ca_queen_w, dplyr::select(ca_zctas, value)
)
ca_zcta_clusters
Running the above code block causes R to crash! Also, here's the session info:
─ Session info ───────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.2.3 (2023-03-15)
os Ubuntu 20.04.5 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2023-12-11
rstudio 2023.03.0+386 Cherry Blossom (server)
pandoc 3.1.2 @ /usr/bin/ (via rmarkdown)
@lixun910, when do you anticipate this will get fixed? Just curious!
Thanks for checking @ashirwad! I checked your data, and noticed that the connectivity of the queen weights is incomplete since there are many islands in this dataset. We should give a warning instead of a hard crash. Instead, you can try to use e.g. KNN weights in SKATER. I will fix this hard crash in next release. Will keep you updated.
Thanks, @lixun910, for the advice! I will try using KNN weights.
@lixun910, is there a rule of thumb for selecting the value for k
in KNN weights, or is it arbitrary?
The number of k really depends on your data and the purpose of how the weights will been used. You can try to use the GeoDa desktop software to check and explore the connectivity map/graph for different k values, see https://geodacenter.github.io/workbook/4a_contig_weights/lab4a.html#fig:contigmapselect. For spatial clustering, different weights could lead to different connectivity graph and then different results. But at least we need a weights that can generate a complete connectivity graph. Hope this info helps. Thanks!
On Dec 13, 2023, at 5:34 PM, Ashirwad Barnwal @.***> wrote:
@lixun910https://urldefense.com/v3/__https://github.com/lixun910__;!!BpyFHLRN4TMTrA!-omVuK5OKg3MiwF4bc0VP9adNEzoILzBJceMCdP5QJxh5eBUCT-W2eX383laitfHtGBPCh13p6lTB_v1f0IjRJsecw$, is there a rule of thumb for selecting the value for k in KNN weights, or is it arbitrary?
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/GeoDaCenter/rgeoda/issues/43*issuecomment-1854903928__;Iw!!BpyFHLRN4TMTrA!-omVuK5OKg3MiwF4bc0VP9adNEzoILzBJceMCdP5QJxh5eBUCT-W2eX383laitfHtGBPCh13p6lTB_v1f0IhdkbY-A$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AASPYTYLKO2XVEKBUXEZQ7LYJJCRLAVCNFSM6AAAAAA2CX7AB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJUHEYDGOJSHA__;!!BpyFHLRN4TMTrA!-omVuK5OKg3MiwF4bc0VP9adNEzoILzBJceMCdP5QJxh5eBUCT-W2eX383laitfHtGBPCh13p6lTB_v1f0I7QnFMCQ$. You are receiving this because you were mentioned.Message ID: @.***>
@lixun910, thanks for the ideas!