vctrs icon indicating copy to clipboard operation
vctrs copied to clipboard

Draft tabular API

Open lionel- opened this issue 4 years ago • 5 comments

lionel- avatar Dec 19 '19 16:12 lionel-

Added tbl_ptype2() and tbl_ptype_common(). The latter is now used in vec_cbind() instead of vec_ptype_common().

lionel- avatar Dec 20 '19 13:12 lionel-

So far this approach works with dynamically grouped data frames that have instrumented columns in proxies, i.e. the grouping variables are wrapped in a special df-col by the proxy method, so that their grouping status can be reinstated by the restore method.

However I'm hitting a wall with statically grouped data frames. In the following example the RHS can't be coerced to the statically grouped type because the input doesn't contain the grouping variables:

static_gdf <- dplyr::group_by(mtcars, cyl, .drop = FALSE)

vec_cbind(static_gdf, mtcars[10:11])
#> Error: Can't find grouping variables in data.

When we cast to the rectangle/tabular type of a statically grouped data frame, we apply the group_data() on the input. If the input does not contain the grouping variables, we can't compute group identifiers for each row. And these identifiers are not part of the tbl_type() of the statically grouped df since it is empty and contains no data. Hence the error.

The problem stems from having the grouping structure separate from the data. So if we want to allow removing grouping columns but still having them active, we'll have a similar error.

Not sure how to solve this. The easy way is to decide that:

  1. You have to ungroup statically grouped data frames before concatenation (or other combining operations) with data frames of different types.

  2. External grouping variables are not possible, or at least they prevent combination with vectors that have a different type.

Another way would be to include the row structure in tabular prototypes. In that case new_data_frame(list(), n = 0L) and new_data_frame(list(), n = 3L) would be two different types. It sort of makes sense as this would be symmetric to the vector types of data frames which include the column structure. This would make it a bit more tricky to write tabular/rectangle methods, but it might make the coercion system for colwise operations more flexible and useful. I will try this approach next.

lionel- avatar Dec 21 '19 12:12 lionel-

Won't vec_rbind()ing two grouped data frames need tabular support?

hadley avatar Dec 21 '19 17:12 hadley

Won't vec_rbind()ing two grouped data frames need tabular support?

No because this is a vector operation. The coercion to rows is done by an ad hoc function, not by the rectangle API. Then the relevant type for rowwise combination is the vector type, which includes the column structure.

By contrast, the relevant type for colwise combination is the rectangle type, which will includes the rowwise structure (once I switch to rct_ptype(mtcars) === new_data_frame(list(), n = nrow(mtcars)) as outlined in the last paragraph of https://github.com/r-lib/vctrs/pull/711#issuecomment-568178172).

lionel- avatar Dec 22 '19 10:12 lionel-

The row structure (nrow and row names) is now included in the frame prototypes. For data frames, row names must match (possibly out of order) to get a common type.

TODO:

  • Custom conditions for failed cast and common type.
  • tbl-cast should reorder input based on row names.
  • Delegate to [ for non-proxied classes.

lionel- avatar Jan 03 '20 09:01 lionel-