stringi icon indicating copy to clipboard operation
stringi copied to clipboard

Which functions should preserve objects' attributes?

Open gagolews opened this issue 10 years ago • 7 comments

~~Currently the only function that preserves a selected subset of the input object's attibutes is stri_sort~~ (see #63)

Which other functions should preserve the attributes? Which attributes should be preserved (names, ...)? If there are > 1 parameters, what shall be the attribute selection strategy?

gagolews avatar Mar 19 '14 18:03 gagolews

dim, names and dimnames? see mostattributes in ?attributes

gagolews avatar Mar 20 '14 22:03 gagolews

It feels like stri_replace_* should definitely keep all attributes since it's reasonable to think of it modifying the contents of an existing vetor.

stri_sort() is harder. In base R:

a <- matrix(1:6, nrow = 3)
sort(a)
#> [1] 1 2 3 4 5 6
b <- c(x = 2, y = 1)
sort(b)
#> y x 
#> 1 2

And ?sort has:

All attributes are removed from the return value (see Becker et al, 1988, p.146) except names, which are sorted. (If partial is specified even the names are removed.) Note that this means that the returned value has no class, except for factors and ordered factors (which are treated specially and whose result is transformed back to the original class).

hadley avatar Oct 30 '15 13:10 hadley

Maybe the only attribute that should be systematically preserved is names? The trickiest case is *_all(simplify = TRUE) where the names would become row names.

hadley avatar Nov 04 '15 14:11 hadley

Should stri_trim (and friends) preserve matrix's? (like base::trimws)

I recently had this type of use case:

library(readr)
library(stringr)
read_lines(readr_example("mtcars.csv")) %>%
  stringr::str_split_fixed(",", n = 11) %>% #returns a matrix
 #stringr::str_trim()  # returns a vector, not wanted
  base::trimws()  # returns a matrix

t-kalinowski avatar Aug 21 '16 15:08 t-kalinowski

Here is another usecase, where it makes sense to preserve names: https://github.com/Tazinho/snakecase/issues/93

I decided to preserve them within the snakecase package now, but noticed that this is not consistent with stringr::str_to_lower etc. and the underlying stringi functions. So I'd like to suggest this change at least for stringi::stri_trans_tolower(), stringi::stri_trans_totitle(), stringi::stri_trans_toupper().

Tazinho avatar Sep 10 '17 11:09 Tazinho

Came here from referred issue above, and agree w @hadley's suggestion to at least preserve names. I hit this when using a named vector as input to labels in a ggplot2 scale, which will match by name if the vector is named. I find this a pretty useful feature in general.

However, when I decided to stringr::str_wrap the labels, it drops the names and fails silently 😭 as scale_* falls back to vector order, matching the wrong labels to the data.

econandrew avatar Feb 10 '18 17:02 econandrew

I think the only attributes you need to preserve are names. You could choose to preserve dims, but they are rarely used (and would require more thought). You don't know how to any other attributes relate to the data, so it's best to leave to the class author to handle with S3 dispatch (i.e. fixing for general objects would require stringi functions to become S3 generics, which (IMO) is outside the scope of this issue.

stri_sort() and stri_subset() could use [, but don't. I think that's a reasonable design choice (favouring performance over S3 dispatch), so you could just document that if you want to preserve class/attributes, you should combine stri_order() and stri_detect() with [ yourself.

hadley avatar Aug 28 '18 14:08 hadley