openxlsx2 icon indicating copy to clipboard operation
openxlsx2 copied to clipboard

Improve dims

Open JanMarvin opened this issue 1 year ago • 4 comments

Due to some merged cell styles sometimes something like A1;B2:C2;A3:C5 is needed to import to a data frame. Sadly this is currently not possible.

Our selection of multiple cells always creates a square from the most left to the most right cell:

library(openxlsx2)

x <- matrix(1:9, 3, 3)

wb <- wb_workbook()$add_worksheet()$add_data(x = x, col_names = FALSE)

wb_to_df(wb, dims = "A1;B2;C3", col_names = FALSE)
#>   A B C
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9

It would be beneficial if this would return 1, 5, 8 instead, or a data frame where all other cells are NA. But if I remember correctly this requires at least modifications of dims_to_dataframe() and the question remains if this "A1;B2;C3" should return a vector or a data frame of a single row or ... something entirely different.

JanMarvin avatar Sep 17 '23 10:09 JanMarvin

My 2 cents:

I think wb_dim() could handle this?

I specifically disallowed this when I wrote it to avoid unexpected things.

I think that keeping a length 1 would be good, (and create less surprise), but maybe I am wrong.

Do you think all functions can handle dims as "A1;B3"?

library(openxlsx2)
# Last version
packageVersion("openxlsx2")
#> [1] '1.0.0.9000'
wb_dims(x = mtcars, cols = "vs")
#> [1] "H2:H33"
wb_dims(x = mtcars, cols = "am")
#> [1] "I2:I33"
wb_dims(x = mtcars, cols = "cyl")
#> [1] "B2:B33"
# H2:I33
wb_dims(x = mtcars, cols = c("vs", "am"))
#> Error in wb_dims(x = mtcars, cols = c("vs", "am")): Supplying multiple column names is not supported by the `wb_dims()` helper, use the `cols` with a range instead of `x` column names.
#>  Use a single `cols` at a time with `wb_dims()`
# B1:B33;I1:I33
wb_dims(x = mtcars, cols = c("cyl", "am"), select = "x")
#> Error in wb_dims(x = mtcars, cols = c("cyl", "am"), select = "x"): Supplying multiple column names is not supported by the `wb_dims()` helper, use the `cols` with a range instead of `x` column names.
#>  Use a single `cols` at a time with `wb_dims()`

Created on 2023-09-27 with reprex v2.0.2

Other suggestions in https://github.com/JanMarvin/openxlsx2/pull/990#issuecomment-2052130605

olivroy avatar Sep 27 '23 14:09 olivroy

Yes I do. But to make it work consistently, some development and lots of testing has to be done. After all there is a reason why it doesn't work right now

JanMarvin avatar Sep 27 '23 16:09 JanMarvin

I was thinking about a data validation book chapter and was wondering if it makes sense to add sheet names to wb_dims(). For this wb_dims() should become a workbook wrapper, since it needs to know the sheet names.

Draft:

wb$dims(sheet = "MySheet", x = mtcars)
#> MySheet!A1:K33

wb$dims(sheet = "My Sheet", x = mtcars)
#> 'My Sheet'!A1:K33

If we grep for ! in dims we could string split sheet and dims from there. In the current data validation list example we use a spreadsheet reference: "'Sheet 2'!$A$1:$A$10". This could be provided by something like: wb$dims(sheet = 2, cols = 1, rows = seq_len(10)). But I'm not entirely sure where else we would benefit from this instead of potentially being closer to openxml formulas.

The following functions would be identical:

wb$to_df(dims = wb$dims(sheet = "Sheet2", x = matrix(1, 2, 2), col_names = FALSE)
wb$to_df(dims = "Sheet2!A1:B2")
wb_to_df(wb, sheet = "Sheet2", dims = wb_dims(wb = wb, x = matrix(1, 2, 2), col_names = FALSE)
wb_to_df(wb, sheet = "Sheet2", dims = "A1:B2")

The data validation example would become something like this:

# data validation example
wb <- wb_workbook()$
  add_worksheet("Sheet 1")$
  add_worksheet("Sheet 2")$
  add_data_table(sheet = 1, x = iris[1:30, ])$
  add_data(sheet = 2, x = sample(iris$Sepal.Length, 10))

# with wb_dims draft
wb$add_data_validation(
  dims = wb$dims(sheet = 1, x = iris[1:30, ], cols = 1),
  type = "list",
  # "'Sheet 2'!$A$1:$A$10"
  value = wb$dims(sheet = 2, cols = 1, rows = seq_len(10))
)

JanMarvin avatar Dec 18 '23 12:12 JanMarvin

Further dims improvements:

It should be possible to select every row/col similar to spreadsheet software:

  • $A:$D every row in the column range (we partially support this is wb_data() iirc), at least it is possible in pivot table creation

  • $1:$5 every column in the row range

  • Support dollar signs in dims (it does not matter to us, but we currently fail to read from wb_to_df(wb, dims = "$A1:$B2")

  • If sheet and dims are written as single string wb_to_df(wb, dims = "Sheet1!A1:B2") we should be able to split sheet and dims into sheet and dims

JanMarvin avatar Mar 23 '24 09:03 JanMarvin