tidyr
tidyr copied to clipboard
Odd behaviour with `extract()` when some elements are absent
When the pattern is not always respected, extract()
does not behave smoothly.
All new columns without a full match become NA
.
I think it would be better to at least have a warning for that.
I wonder if it would be possible to have similar arguments to fill
and extra
to extract(), similar to separate()
I know it is easy to solve with dplyr and a couple more steps, but I think that extract()
provides a clean way to perform this task.
dat <- tibble::tibble(
x = c("foo (1)", "foo1", "foo2 (2)", "foo3 (3)")
)
dat |>
tidyr::extract(
x,
into = c("x1", "x2"),
regex = "(.+) \\((\\d+)\\)",
convert = TRUE
)
#> # A tibble: 4 x 2
#> x1 x2
#> <chr> <int>
#> 1 foo 1
#> 2 <NA> NA
#> 3 foo2 2
#> 4 foo3 3
# Using separate and its extra arguments can work for that.
dat |>
tidyr::separate(
x,
into = c("x1", "x2"),
sep = " "
)
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [2].
#> # A tibble: 4 x 2
#> x1 x2
#> <chr> <chr>
#> 1 foo (1)
#> 2 foo1 <NA>
#> 3 foo2 (2)
#> 4 foo3 (3)
#
Created on 2022-06-29 by the reprex package (v2.0.1)
You can silent the separate()
warning by specifying fill = "right"
. Maybe a similar argument could be useful in extract()
Expected output
# With a warning
#> # A tibble: 4 x 2
#> x1 x2
#> <chr> <chr>
#> 1 foo 1
#> 2 foo1 <NA>
#> 3 foo2 2
#> 4 foo3 3
# with a code that would look like that.
dat |>
tidyr::extract(
x,
into = c("x1", "x2"),
regex = "(.+) \\((\\d+)\\)",
convert = TRUE,
fill = "right"
)