assertr
assertr copied to clipboard
Feature Request: `assert_cols()`
I often want to do a set of checks on the columns of data.frames before doing checks on the values within each column itself.
For example, I want to check that all columns are present, and rather than use the has_names() function with verify(), I'd like the output to specify what column or columns are missing. Similarly, I use verify(is.numeric(numeric_column_1)) %>% verify(is.numeric(numeric_column_2)) when a cleaner report would look more like assert(is.numeric, numeric_column_1, numeric_column_2).
What would you think about an assert_cols() function?
I really like that idea. I'd like to make sure the semantics operate a lot like assert. Would it be like assert but on the vector of column names?
I think a lot of great functionality could come from assert_cols. Like checking whether there are no
- duplicate column names
- column names contain no ridiculous characters
- missing columns (like you said)
- data type (like you said)
- fits a regex pattern
- etc...
Yeah, that covers a lot of the space I was thinking of. Generally, I'm thinking that it would be used two few different ways:
- On the column names (duplicate names, character check, missing columns, name regexp)
- On the column overall (data type is the main thing that I see here as the regexp pattern of values within the column seems like it would be handled by
assert(), but if you wanted the answer for the column name instead of the row, then the regexp method could apply here, too.)
I see the above two as different ways to use the data, so I'd think they would either be two different functions (e.g. assert_col_names() and assert_col(), my preference) or one function with two modes of use (e.g. assert_col(..., assert_type = c("names", "values"))).
FYI, https://sfirke.github.io/janitor/reference/clean_names.html can help a lot with rational column naming, but it is a correction function rather than a checking function.
What do you think? Are there other use cases?