dm icon indicating copy to clipboard operation
dm copied to clipboard

count() doesn't keep primary keys

Open krlmlr opened this issue 1 year ago • 6 comments

I tried implementing dplyr_reconstruct(), with no success so far.

library(conflicted)
library(dm)
library(tidyverse)

# pkgload::load_all()
library(dm)
library(dplyr)

dm <- dm_nycflights13()

tbls <-
  dm %>%
  dm_get_tables(keyed = TRUE)

flights <- tbls$flights
airports <- tbls$airports

by_origin <-
  flights %>%
  count(origin)

by_origin
#> # A tibble: 3 × 2
#> # Keys:     — | 0 | 4
#>   origin     n
#>   <chr>  <int>
#> 1 EWR      641
#> 2 JFK      602
#> 3 LGA      518

# PK for by_origin is missing
dm(flights, airports, by_origin) %>%
  dm_get_all_pks()
#> # A tibble: 1 × 2
#>   table    pk_col
#>   <chr>    <keys>
#> 1 airports faa

Created on 2022-07-21 by the reprex package (v2.0.1)

krlmlr avatar Jul 21 '22 04:07 krlmlr

The PR above can be extended.

More action items:

  • rename() can't currently work, need rename.keyed_tbl()
  • check all other verbs

krlmlr avatar Aug 15 '23 04:08 krlmlr

FWIW, count() seems to behave properly when using a dm_zoomed:

suppressPackageStartupMessages({
  library(dm)
  library(dplyr)
})

dm_nycflights13() %>% dm_zoom_to(airlines) %>% count(carrier) %>% dm_update_zoomed() %>% dm_get_all_pks()
#> # A tibble: 4 × 3
#>   table    pk_col            autoincrement
#>   <chr>    <keys>            <lgl>        
#> 1 airlines carrier           FALSE        
#> 2 airports faa               FALSE        
#> 3 planes   tailnum           FALSE        
#> 4 weather  origin, time_hour FALSE
dm_nycflights13() %>% dm_zoom_to(airlines) %>% count(name) %>% dm_update_zoomed() %>% dm_get_all_pks()
#> # A tibble: 3 × 3
#>   table    pk_col            autoincrement
#>   <chr>    <keys>            <lgl>        
#> 1 airports faa               FALSE        
#> 2 planes   tailnum           FALSE        
#> 3 weather  origin, time_hour FALSE

Created on 2023-08-21 with reprex v2.0.2

TSchiefer avatar Aug 21 '23 14:08 TSchiefer

Would you expect the columns used for count() to constitute a new PK? I would think that in your example no PK was actually dropped, but a potential one wasn't set.

TSchiefer avatar Aug 21 '23 14:08 TSchiefer

This is about dm_get_tables(keyed = TRUE) .

krlmlr avatar Aug 21 '23 14:08 krlmlr

Yes, but I am not certain which behavior you'd like. If you use count() on a PK in a dm_keyed_tbl it will be kept as a PK:

library(conflicted)
library(dm)
library(tidyverse)

# pkgload::load_all()
library(dm)
library(dplyr)

dm <- dm_nycflights13()

tbls <-
  dm %>%
  dm_get_tables(keyed = TRUE)

flights <- tbls$flights
airports <- tbls$airports

by_faa <-
  airports %>%
  count(faa)

by_faa
#> # A tibble: 86 × 2
#> # Keys:     `faa` | 1 | 0
#>    faa       n
#>    <chr> <int>
#>  1 ALB       1
#>  2 ATL       1
#>  3 AUS       1
#>  4 BDL       1
#>  5 BHM       1
#>  6 BNA       1
#>  7 BOS       1
#>  8 BTV       1
#>  9 BUF       1
#> 10 BUR       1
#> # ℹ 76 more rows

# PK for by_origin is missing
dm(flights, airports, by_faa) %>%
  dm_get_all_pks()
#> # A tibble: 2 × 3
#>   table    pk_col autoincrement
#>   <chr>    <keys> <lgl>        
#> 1 airports faa    FALSE        
#> 2 by_faa   faa    FALSE

Created on 2023-08-21 with reprex v2.0.2

This seems consistent to me with the behavior for dm_zoomed. We can also discuss some other time.

TSchiefer avatar Aug 21 '23 14:08 TSchiefer

In the original post, I expect by_origin to have origin as a new primary key.

krlmlr avatar Aug 22 '23 09:08 krlmlr