dm
dm copied to clipboard
count() doesn't keep primary keys
I tried implementing dplyr_reconstruct()
, with no success so far.
library(conflicted)
library(dm)
library(tidyverse)
# pkgload::load_all()
library(dm)
library(dplyr)
dm <- dm_nycflights13()
tbls <-
dm %>%
dm_get_tables(keyed = TRUE)
flights <- tbls$flights
airports <- tbls$airports
by_origin <-
flights %>%
count(origin)
by_origin
#> # A tibble: 3 × 2
#> # Keys: — | 0 | 4
#> origin n
#> <chr> <int>
#> 1 EWR 641
#> 2 JFK 602
#> 3 LGA 518
# PK for by_origin is missing
dm(flights, airports, by_origin) %>%
dm_get_all_pks()
#> # A tibble: 1 × 2
#> table pk_col
#> <chr> <keys>
#> 1 airports faa
Created on 2022-07-21 by the reprex package (v2.0.1)
The PR above can be extended.
More action items:
-
rename()
can't currently work, needrename.keyed_tbl()
- check all other verbs
FWIW, count()
seems to behave properly when using a dm_zoomed
:
suppressPackageStartupMessages({
library(dm)
library(dplyr)
})
dm_nycflights13() %>% dm_zoom_to(airlines) %>% count(carrier) %>% dm_update_zoomed() %>% dm_get_all_pks()
#> # A tibble: 4 × 3
#> table pk_col autoincrement
#> <chr> <keys> <lgl>
#> 1 airlines carrier FALSE
#> 2 airports faa FALSE
#> 3 planes tailnum FALSE
#> 4 weather origin, time_hour FALSE
dm_nycflights13() %>% dm_zoom_to(airlines) %>% count(name) %>% dm_update_zoomed() %>% dm_get_all_pks()
#> # A tibble: 3 × 3
#> table pk_col autoincrement
#> <chr> <keys> <lgl>
#> 1 airports faa FALSE
#> 2 planes tailnum FALSE
#> 3 weather origin, time_hour FALSE
Created on 2023-08-21 with reprex v2.0.2
Would you expect the columns used for count()
to constitute a new PK? I would think that in your example no PK was actually dropped, but a potential one wasn't set.
This is about dm_get_tables(keyed = TRUE)
.
Yes, but I am not certain which behavior you'd like. If you use count()
on a PK in a dm_keyed_tbl
it will be kept as a PK:
library(conflicted)
library(dm)
library(tidyverse)
# pkgload::load_all()
library(dm)
library(dplyr)
dm <- dm_nycflights13()
tbls <-
dm %>%
dm_get_tables(keyed = TRUE)
flights <- tbls$flights
airports <- tbls$airports
by_faa <-
airports %>%
count(faa)
by_faa
#> # A tibble: 86 × 2
#> # Keys: `faa` | 1 | 0
#> faa n
#> <chr> <int>
#> 1 ALB 1
#> 2 ATL 1
#> 3 AUS 1
#> 4 BDL 1
#> 5 BHM 1
#> 6 BNA 1
#> 7 BOS 1
#> 8 BTV 1
#> 9 BUF 1
#> 10 BUR 1
#> # ℹ 76 more rows
# PK for by_origin is missing
dm(flights, airports, by_faa) %>%
dm_get_all_pks()
#> # A tibble: 2 × 3
#> table pk_col autoincrement
#> <chr> <keys> <lgl>
#> 1 airports faa FALSE
#> 2 by_faa faa FALSE
Created on 2023-08-21 with reprex v2.0.2
This seems consistent to me with the behavior for dm_zoomed
. We can also discuss some other time.
In the original post, I expect by_origin
to have origin
as a new primary key.