dm
dm copied to clipboard
Mention compound keys in "how-to-dm-df" tutorial
Mention here about why compound keys are needed when no single candidate is found for a PK.
Motivating example:
library(dm)
library(nycflights13)
x <- new_dm(tibble::lst(weather, airports))
dm_enum_pk_candidates(x, "weather")
#> # A tibble: 15 × 3
#> columns candidate why
#> <keys> <lgl> <chr>
#> 1 origin FALSE has duplicate values: JFK (8706), LGA (8706), EWR (8703)
#> 2 year FALSE has duplicate values: 2013 (26115)
#> 3 month FALSE has duplicate values: 5 (2232), 7 (2228), 3 (2227), 1 (…
#> 4 day FALSE has duplicate values: 3 (864), 7 (864), 8 (864), 9 (864…
#> 5 hour FALSE has duplicate values: 1 (1093), 5 (1092), 8 (1092), 11 …
#> 6 temp FALSE has duplicate values: 37.94 (521), 73.94 (509), 73.04 (…
#> 7 dewp FALSE has duplicate values: 28.94 (505), 53.96 (490), 32.00 (…
#> 8 humid FALSE has duplicate values: 100.00 (286), 54.51 (59), 93.08 (…
#> 9 wind_dir FALSE has duplicate values: 310 (1341), 0 (1256), 320 (1204),…
#> 10 wind_speed FALSE has duplicate values: 9.20624 (2335), 8.05546 (2312), 6…
#> 11 wind_gust FALSE has 20778 missing values, and duplicate values: 23.0156…
#> 12 precip FALSE has duplicate values: 0.00 (24366), 0.01 (454), 0.02 (2…
#> 13 pressure FALSE has 2729 missing values, and duplicate values: 1016.2 (…
#> 14 visib FALSE has duplicate values: 10 (21847), 9 (808), 8 (581), 7 (…
#> 15 time_hour FALSE has duplicate values: 2013-01-01 01:00:00 (3), 2013-01-…
Created on 2022-07-08 by the reprex package (v2.0.1)