nplyr icon indicating copy to clipboard operation
nplyr copied to clipboard

Error when data frame contains column with spatial geometry of SF package

Open rcepka opened this issue 1 year ago • 7 comments

Hello, I have this DF:

Rows: 9 Columns: 5 $ year <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020 $ kraj_name <chr> "Bratislavský kraj", "Trnavský kraj", "Trenčiansky kraj", "Nitriansky kraj", "Žilinský kraj", "Ban… $ votes_total <dbl> 427111, 309238, 331057, 352826, 391342, 317833, 391636, 356643, 3825 $ Shape <POLYGON [°]> POLYGON ((16.94973 48.2682,..., POLYGON ((17.31893 48.06272..., POLYGON ((18.8238 48.73498,..., PO… $ data <list> [<sf[24 x 5]>], [<sf[24 x 5]>], [<sf[24 x 5]>], [<sf[24 x 5]>], [<sf[24 x 5]>], [<sf[24 x 5]>],

The "Shape" column contains geo data of SF package; which is included also in the nested data frames in "data" column. I want to manipulate with the nested data, my code is as follows: results_voting_subjects_kraj.sf_nested_2020 <- results_voting_subjects_kraj.sf_2020 %>% nest( data = c(party_rank, political_subject, votes_received, votes_received_perc, Shape), .by = c(year, kraj_name, votes_total, Shape) ) %>% nplyr::nest_select(data, party_rank)

but I am getting this error:

Error: argument .nest_data msut be of class "grouped_df", "tbl_df", "tbl", "data.frame".

Without the spatial data column the nest_select() function works fine; if I delete it prior to nesting the DF. Is this a bug or am I doing something wrong? Many thanks...

rcepka avatar Jun 12 '23 17:06 rcepka

hi @rcepka --- nplyr checks the class of the nested column & it looks like whatever is underneath isn't a tibble/df. Can you call the class() function on the original df? I also recommend running as a reprex to help w/readability

markjrieke avatar Jun 12 '23 17:06 markjrieke

Hi Mark, Thank you for response. Here is the class of the original data frame: > class(results_voting_subjects_kraj.sf_2020) [1] "sf" "tbl_df" "tbl" "data.frame"

Reprex: I installed it and ran command: reprex(class(results_voting_subjects_kraj.sf_2020)), but got the following wrong output:

class(results_voting_subjects_kraj.sf_2020)
#> Error in eval(expr, envir, enclos): object 'results_voting_subjects_kraj.sf_2020' not found

Created on 2023-06-12 with reprex v2.0.2

I guess I am using Reprex incorrectly; will dive into it more deeply and learn it later. To save time lets stick for now with my original "dirty" output please :)

rcepka avatar Jun 12 '23 21:06 rcepka

Mark, you were right. I called the as_tibble() function prior to nesting the data frame and now everything works fine. Thank you so much for you help....

rcepka avatar Jun 12 '23 21:06 rcepka

Maybe just let me please suggest if you could eventually drop here a sentence or two of short explanation what was wrong and why I was getting this error; not just for me but also for other junior R programmers, to understand what happened and to learn...because even though I solved the problem, I dont know how nor why it occurred. Thanks :)

rcepka avatar Jun 12 '23 21:06 rcepka

Hey @rcepka --- the functions underneath that do the nested data manipulation make the assumption that the data being passed to it come in the form of a data frame/tibble, so each nplyr function has an explicit check that it's getting passed a df/tibble.

I'm not super familiar with sf objects, but it looks like the example you gave is from something that has class properties as both sf and tbl. My guess is that regular dplyr functions would work on the results_voting_subjects_kraj.sf_2020; if that's the case, I'll need to update nplyr to account for this.

Can you check if regular dplyr functions work on non-nested versions of the sf obj?

markjrieke avatar Jun 19 '23 13:06 markjrieke

Hi @markjrieke , yes dplyr functions work well with SF objects. there is a natural support. For example, when doing grouping and summarizing data frame with SF object, I dont even need to explicitly mention the column with geospatial data, it is included automatically. Actually I need to use st_drop_geometry() if I want to further process the sata frame without the SF object. Some more information: https://r-spatial.github.io/sf/articles/sf4.html
https://r-spatial.github.io/sf/reference/tidyverse.html

rcepka avatar Jun 21 '23 07:06 rcepka

gotcha --- I'll take a look when I get the chance. The articles are a helpful reference --- I'll need to get more familiar w/sf to see if adding nplyr support is possible, but I'll let you know in this issue thread if so!

markjrieke avatar Jun 22 '23 14:06 markjrieke